1
00:00:00,000 --> 00:00:01,810

2
00:00:01,810 --> 00:00:04,240
The following content is
provided under a Creative

3
00:00:04,240 --> 00:00:05,650
Commons license.

4
00:00:05,650 --> 00:00:08,680
Your support will help MIT
OpenCourseWare continue to

5
00:00:08,680 --> 00:00:12,340
offer high quality educational
resources for free.

6
00:00:12,340 --> 00:00:15,230
To make a donation or view
additional materials from

7
00:00:15,230 --> 00:00:19,160
hundreds of MIT courses, visit
MIT OpenCourseWare at

8
00:00:19,160 --> 00:00:20,410
ocw.mit.edu.

9
00:00:20,410 --> 00:00:26,000

10
00:00:26,000 --> 00:00:27,220
CHARLES LEISERSON: So today
we're going to take a little

11
00:00:27,220 --> 00:00:33,120
bit closer look at what's
happening under the covers

12
00:00:33,120 --> 00:00:37,240
when you compile a C program.

13
00:00:37,240 --> 00:00:42,080
But before we get into that,
we did a little interesting

14
00:00:42,080 --> 00:00:51,160
correlation on your scores for
the first problem, for the

15
00:00:51,160 --> 00:00:52,760
every bit one.

16
00:00:52,760 --> 00:00:55,810
So this is basically plotting.

17
00:00:55,810 --> 00:01:00,350
It's a scatter plot of how you
did in your test coverage

18
00:01:00,350 --> 00:01:07,410
score versus how you did in
your correctness score and

19
00:01:07,410 --> 00:01:11,050
performance, correctness and
performance together.

20
00:01:11,050 --> 00:01:15,270
And what's interesting is that
if you did better in your test

21
00:01:15,270 --> 00:01:19,940
coverage, you did better
in your performance and

22
00:01:19,940 --> 00:01:22,161
correctness.

23
00:01:22,161 --> 00:01:24,640
OK, that's a pretty good
correlation, right?

24
00:01:24,640 --> 00:01:26,465
There's some outliers here.

25
00:01:26,465 --> 00:01:27,925
But that's a pretty
good correlation.

26
00:01:27,925 --> 00:01:36,170

27
00:01:36,170 --> 00:01:36,890
Yeah, John?

28
00:01:36,890 --> 00:01:37,643
Yeah?

29
00:01:37,643 --> 00:01:38,106
JOHN: The--

30
00:01:38,106 --> 00:01:39,340
CHARLES LEISERSON: Do
we have a handheld?

31
00:01:39,340 --> 00:01:40,360
Here we go.

32
00:01:40,360 --> 00:01:41,610
Just a second.

33
00:01:41,610 --> 00:01:47,660

34
00:01:47,660 --> 00:01:49,610
The only thing is we have to
figure out how to turn it on.

35
00:01:49,610 --> 00:01:50,860
There we go.

36
00:01:50,860 --> 00:01:52,885

37
00:01:52,885 --> 00:01:55,640
JOHN: Yeah, so just to clarify,
the people who

38
00:01:55,640 --> 00:01:58,515
seemingly got no test coverage
but really good performance

39
00:01:58,515 --> 00:02:03,420
scores, what actually happened
was that we tested for things

40
00:02:03,420 --> 00:02:07,630
that we didn't expect students
to cover for like on feeding

41
00:02:07,630 --> 00:02:12,510
invalid values to better a set
and better a get or testing

42
00:02:12,510 --> 00:02:14,830
for private functions to
their implementations.

43
00:02:14,830 --> 00:02:19,190
So in reality they had better
test suites than the coverage

44
00:02:19,190 --> 00:02:20,440
score would indicate.

45
00:02:20,440 --> 00:02:26,970

46
00:02:26,970 --> 00:02:28,140
CHARLES LEISERSON: So what
are the lessons that

47
00:02:28,140 --> 00:02:31,430
one draws from this?

48
00:02:31,430 --> 00:02:35,270
So professional engineers know
what the lessons are.

49
00:02:35,270 --> 00:02:39,320
So the lessons are that it is
actually better, if you have a

50
00:02:39,320 --> 00:02:44,910
coding problem to do, to
write tests first.

51
00:02:44,910 --> 00:02:48,850
Before you code you
write your tests.

52
00:02:48,850 --> 00:02:52,360
And that actually speeds
the development of

53
00:02:52,360 --> 00:02:53,880
fast correct code.

54
00:02:53,880 --> 00:02:55,080
It's actually faster.

55
00:02:55,080 --> 00:02:58,340
You get to the end result
much faster.

56
00:02:58,340 --> 00:03:02,820
Because whenever you make an
error in your program, you

57
00:03:02,820 --> 00:03:07,360
instantly know that you may
have a problem rather than

58
00:03:07,360 --> 00:03:10,380
thinking that you're doing
something OK and then

59
00:03:10,380 --> 00:03:15,120
discovering that, oh, in fact
your code is, in fact,

60
00:03:15,120 --> 00:03:18,120
incorrect, and you're working
away optimizing something

61
00:03:18,120 --> 00:03:25,030
that's not working So before
coding, it's highly

62
00:03:25,030 --> 00:03:26,620
recommended that
you write test.

63
00:03:26,620 --> 00:03:31,740
Also if you find a bug, when
you find a bug, the first

64
00:03:31,740 --> 00:03:35,840
thing you should do is write a
test for that bug if it wasn't

65
00:03:35,840 --> 00:03:38,310
already covered.

66
00:03:38,310 --> 00:03:40,490
Then you fix the bug.

67
00:03:40,490 --> 00:03:43,860
And then you make sure that your
test now, that your new

68
00:03:43,860 --> 00:03:46,830
implementation, passes
that particular one.

69
00:03:46,830 --> 00:03:48,850
Professional engineers
know this.

70
00:03:48,850 --> 00:03:52,310
Professional software developers
know this.

71
00:03:52,310 --> 00:03:54,360
It comes hard.

72
00:03:54,360 --> 00:03:59,760
And if you want a job at any
top flight software firm,

73
00:03:59,760 --> 00:04:02,520
they're going to expect that you
know that you write tests

74
00:04:02,520 --> 00:04:04,990
first before you do coding.

75
00:04:04,990 --> 00:04:08,100

76
00:04:08,100 --> 00:04:12,030
The second lesson isn't quite
so obvious from this.

77
00:04:12,030 --> 00:04:14,780
But it's the second lesson
that I think some people

78
00:04:14,780 --> 00:04:20,120
experienced in the class which
was the idea of putting you in

79
00:04:20,120 --> 00:04:22,800
groups, in particular in pairs,
was not so that you

80
00:04:22,800 --> 00:04:26,930
could do divide and conquer
on the code.

81
00:04:26,930 --> 00:04:29,550
It was to do pair programming.

82
00:04:29,550 --> 00:04:33,380
And what we found was that
a bunch of groups

83
00:04:33,380 --> 00:04:35,540
divided up the work.

84
00:04:35,540 --> 00:04:38,150
And they said, OK, it'll go
faster if you do this one and

85
00:04:38,150 --> 00:04:39,800
I do that one.

86
00:04:39,800 --> 00:04:42,810
Once again that's probably
a mistake.

87
00:04:42,810 --> 00:04:47,010
If you can sit together and
take turns at the keyboard

88
00:04:47,010 --> 00:04:49,890
making the changes, it may seem
like it's going slower to

89
00:04:49,890 --> 00:04:53,580
begin with, but it's amazing how
many errors you catch and

90
00:04:53,580 --> 00:04:57,890
how quickly you find your errors
because you're just

91
00:04:57,890 --> 00:04:59,560
talking with each other.

92
00:04:59,560 --> 00:05:02,990
And it's like, oh, duh.

93
00:05:02,990 --> 00:05:06,860
So good programmers know this.

94
00:05:06,860 --> 00:05:10,020
That it really helps to have
more than one person

95
00:05:10,020 --> 00:05:12,560
understand what's going
on in the code.

96
00:05:12,560 --> 00:05:15,580
So the people who had difficulty
with their partners

97
00:05:15,580 --> 00:05:19,660
one way or another often did
not, it was partly because

98
00:05:19,660 --> 00:05:22,560
they just divided up the work,
you're responsible for that,

99
00:05:22,560 --> 00:05:25,290
oh, we got a bad grade on
that, that's your fault.

100
00:05:25,290 --> 00:05:29,540
No, both partners own
that grade 100%.

101
00:05:29,540 --> 00:05:35,220
And the best way to ensure
is to work together.

102
00:05:35,220 --> 00:05:39,060
Now, this sometimes flies in the
face of people who believe

103
00:05:39,060 --> 00:05:42,440
that they are clever or more
experienced than somebody,

104
00:05:42,440 --> 00:05:47,720
than their partner, oh, I can do
this much better on my own.

105
00:05:47,720 --> 00:05:51,370
Usually, that's true for little
projects, but as the

106
00:05:51,370 --> 00:05:59,390
projects get bigger that
becomes a much harder

107
00:05:59,390 --> 00:06:01,350
situation to deal with.

108
00:06:01,350 --> 00:06:04,980
It becomes the case that you
really want two brains looking

109
00:06:04,980 --> 00:06:09,100
at the same thing, four
eyes as opposed to two

110
00:06:09,100 --> 00:06:12,310
eyes looking at things.

111
00:06:12,310 --> 00:06:16,060
But I think, in particular,
before coding write test.

112
00:06:16,060 --> 00:06:18,320
And we are right now working
on improving the

113
00:06:18,320 --> 00:06:19,620
infrastructure.

114
00:06:19,620 --> 00:06:24,940
One of the things that they have
in most companies is, at

115
00:06:24,940 --> 00:06:29,510
the very minimum, they have
what's called a nightly build.

116
00:06:29,510 --> 00:06:32,710
Nightly build says they take all
the software, they build

117
00:06:32,710 --> 00:06:35,800
it, and then they run regression
tests against it

118
00:06:35,800 --> 00:06:37,990
all night while everybody's
home sleeping.

119
00:06:37,990 --> 00:06:40,480
Come in the next morning, here's
the things that broke.

120
00:06:40,480 --> 00:06:44,120
And if you broke the
build, you got some

121
00:06:44,120 --> 00:06:46,760
work to do that morning.

122
00:06:46,760 --> 00:06:50,260
And it's generally not a good
idea to break the build.

123
00:06:50,260 --> 00:06:54,050
What has been demonstrating,
in fact, is that continuous

124
00:06:54,050 --> 00:06:56,520
build is even better.

125
00:06:56,520 --> 00:06:58,940
This is where, whenever you make
a change to the program,

126
00:06:58,940 --> 00:07:02,250
you run the full suite
of tests on it.

127
00:07:02,250 --> 00:07:04,160
And we're going to
look into it.

128
00:07:04,160 --> 00:07:05,830
We have to see what
our resources are.

129
00:07:05,830 --> 00:07:09,660
As you know, our TAs are
a limited resource.

130
00:07:09,660 --> 00:07:11,820
But we're going to look into
seeing whether we can provide

131
00:07:11,820 --> 00:07:14,030
more of that kind of
infrastructure on some of the

132
00:07:14,030 --> 00:07:15,940
later projects for you folks.

133
00:07:15,940 --> 00:07:18,190
So you can sort of see
the matrix that we

134
00:07:18,190 --> 00:07:19,370
eventually got to you.

135
00:07:19,370 --> 00:07:21,920
You can see that develop
in real time.

136
00:07:21,920 --> 00:07:24,650
How am I doing against
other people's tests?

137
00:07:24,650 --> 00:07:26,740
How are they doing against
my tests, et cetera?

138
00:07:26,740 --> 00:07:29,600

139
00:07:29,600 --> 00:07:31,410
So we'll see whether
we can do that.

140
00:07:31,410 --> 00:07:34,810
But what's funny is you think
that it'd be faster to just

141
00:07:34,810 --> 00:07:35,950
code and do it.

142
00:07:35,950 --> 00:07:39,120
Computer science is full
of wonderful paradoxes.

143
00:07:39,120 --> 00:07:44,850
And one of them is that doing
things like writing the extra

144
00:07:44,850 --> 00:07:51,120
code to test is actually faster
than not writing it,

145
00:07:51,120 --> 00:07:52,650
surprisingly.

146
00:07:52,650 --> 00:07:55,460
It really gets you to the
end result a lot faster.

147
00:07:55,460 --> 00:07:56,710
Any questions about that?

148
00:07:56,710 --> 00:07:59,200

149
00:07:59,200 --> 00:08:00,450
Any comments about that?

150
00:08:00,450 --> 00:08:03,410

151
00:08:03,410 --> 00:08:08,330
Let's talk about our today.

152
00:08:08,330 --> 00:08:11,000
So today we're going to talk
mostly about single threaded

153
00:08:11,000 --> 00:08:12,990
performance.

154
00:08:12,990 --> 00:08:15,020
This is one instruction
stream that you're

155
00:08:15,020 --> 00:08:16,970
trying to make go fast.

156
00:08:16,970 --> 00:08:23,620
But if you look at today's
computing milieu, how all of

157
00:08:23,620 --> 00:08:26,380
the computers are used,
what do you have?

158
00:08:26,380 --> 00:08:29,650
You've got networks of
multi-core clusters.

159
00:08:29,650 --> 00:08:32,270
It's parallelism everywhere.

160
00:08:32,270 --> 00:08:35,950
You've got shared memory among
processors within a chip.

161
00:08:35,950 --> 00:08:39,409
You've got message passing among
machines in a cluster.

162
00:08:39,409 --> 00:08:42,210
You've got network protocols
among clusters so that you can

163
00:08:42,210 --> 00:08:47,400
do wide area things.

164
00:08:47,400 --> 00:08:50,620
Yet we're saying, no, let's take
a look at what happens on

165
00:08:50,620 --> 00:08:54,320
one core on one machine.

166
00:08:54,320 --> 00:08:59,920
So why is that important
to focus first on what

167
00:08:59,920 --> 00:09:02,590
one core can do?

168
00:09:02,590 --> 00:09:05,530
Why study single threaded
performance at all?

169
00:09:05,530 --> 00:09:06,920
Let's just go do the
parallel stuff.

170
00:09:06,920 --> 00:09:08,170
That's more fun anyway.

171
00:09:08,170 --> 00:09:10,820

172
00:09:10,820 --> 00:09:13,630
Well, there are a couple of
reasons that I can think of.

173
00:09:13,630 --> 00:09:22,010
The first one is at the end of
the day, even if you've got

174
00:09:22,010 --> 00:09:27,230
something running widely in
parallel, the code is running

175
00:09:27,230 --> 00:09:29,560
in each core in a single
threaded manner.

176
00:09:29,560 --> 00:09:32,050
You just have a bunch of them.

177
00:09:32,050 --> 00:09:36,600
And so if you've given up a
factor of two or a factor of

178
00:09:36,600 --> 00:09:40,690
four in performance or even
more, as you're aware, you can

179
00:09:40,690 --> 00:09:44,240
sometimes make it orders of
magnitude, but in performance

180
00:09:44,240 --> 00:09:47,010
what you're saying is that
you're going to end up using

181
00:09:47,010 --> 00:09:52,060
much more resources to do your
particular job in parallel.

182
00:09:52,060 --> 00:09:53,690
And resources is money.

183
00:09:53,690 --> 00:10:03,360
So if I can do the job with a
cluster of 16 processors and

184
00:10:03,360 --> 00:10:06,000
somebody else can do it in
a cluster with only four

185
00:10:06,000 --> 00:10:12,360
processors, hey, they just spent
a quarter the amount on

186
00:10:12,360 --> 00:10:17,080
not just the capital investment
in that hardware

187
00:10:17,080 --> 00:10:21,720
but also the operating costs
of what it cost to actually

188
00:10:21,720 --> 00:10:27,040
cool and provide electricity to
and maintain and so forth.

189
00:10:27,040 --> 00:10:29,050
All that gets much cheaper.

190
00:10:29,050 --> 00:10:31,100
So if you get good single
thread performance, it

191
00:10:31,100 --> 00:10:31,730
translates.

192
00:10:31,730 --> 00:10:36,300
That's kind of the
direct reason.

193
00:10:36,300 --> 00:10:40,350
The indirect reason is, for
studying it, is that many of

194
00:10:40,350 --> 00:10:42,160
the lessons will generalize.

195
00:10:42,160 --> 00:10:44,310
So things that we'll see for
single core, there is an

196
00:10:44,310 --> 00:10:47,890
analogy when you start
looking at parallel

197
00:10:47,890 --> 00:10:50,640
and distributed systems.

198
00:10:50,640 --> 00:10:53,320
So that's a little
less concrete.

199
00:10:53,320 --> 00:10:57,360
But as you'll see as you gain
experience, you'll see that

200
00:10:57,360 --> 00:11:00,980
there's a lot of lessons that
generalize to how do you think

201
00:11:00,980 --> 00:11:05,715
about performance no matter
what the context.

202
00:11:05,715 --> 00:11:09,660

203
00:11:09,660 --> 00:11:12,860
So what about a single
threaded machine?

204
00:11:12,860 --> 00:11:14,130
What's it like?

205
00:11:14,130 --> 00:11:16,370
So some of this is going to be a
little bit review, but we're

206
00:11:16,370 --> 00:11:17,590
going to just sort
of go deeper.

207
00:11:17,590 --> 00:11:20,330
We've sort of been taking
layers off the onion.

208
00:11:20,330 --> 00:11:22,210
And today we're going
to take a few more

209
00:11:22,210 --> 00:11:23,960
layers off the onion.

210
00:11:23,960 --> 00:11:26,680
So you have inside
a processor core.

211
00:11:26,680 --> 00:11:28,340
You've got registers.

212
00:11:28,340 --> 00:11:30,930
You've got the functional
units to do your ALU

213
00:11:30,930 --> 00:11:34,880
operations, floating point
units, vector units these

214
00:11:34,880 --> 00:11:39,760
days, and all the stuff to do
instruction, execution, and

215
00:11:39,760 --> 00:11:42,300
coordination, scheduling,
out of order

216
00:11:42,300 --> 00:11:45,100
execution, and so forth.

217
00:11:45,100 --> 00:11:48,510
In addition then, you have
a memory hierarchy.

218
00:11:48,510 --> 00:11:50,780
Within the core, you typically
have registers

219
00:11:50,780 --> 00:11:52,520
and L1 and L2 caches.

220
00:11:52,520 --> 00:11:57,910
And then outside the core
often is the L3 cache.

221
00:11:57,910 --> 00:12:00,480
DRAM memory, you may have
a solid-state drive

222
00:12:00,480 --> 00:12:02,780
these days and disk.

223
00:12:02,780 --> 00:12:05,240
And so in that context,
you're trying to make

224
00:12:05,240 --> 00:12:06,490
your code run fast.

225
00:12:06,490 --> 00:12:09,540

226
00:12:09,540 --> 00:12:15,100
So when you compile the piece
of code, so here I have a

227
00:12:15,100 --> 00:12:15,660
piece of code.

228
00:12:15,660 --> 00:12:19,370
I'm always amused when I put up
a Fibonacci as the example.

229
00:12:19,370 --> 00:12:23,210
Because this is a really
terrible way to compute

230
00:12:23,210 --> 00:12:25,580
Fibonacci numbers.

231
00:12:25,580 --> 00:12:28,530
So this is an exponential time
algorithm for computing

232
00:12:28,530 --> 00:12:30,120
Fibonacci numbers.

233
00:12:30,120 --> 00:12:35,100
And you may be aware you can do
this in linear time just by

234
00:12:35,100 --> 00:12:36,910
adding up from the bottom.

235
00:12:36,910 --> 00:12:39,070
In fact, if you take the
algorithms course, you learn

236
00:12:39,070 --> 00:12:44,220
that you can actually do this in
logarithmic time by matrix,

237
00:12:44,220 --> 00:12:46,770
recursive squaring
of matrices.

238
00:12:46,770 --> 00:12:49,510
So it's sort of interesting to
put up something where we say

239
00:12:49,510 --> 00:12:52,100
we're going to optimize this.

240
00:12:52,100 --> 00:12:54,570
And, of course, we'll get a
constant factor improvement on

241
00:12:54,570 --> 00:12:55,800
something like this.

242
00:12:55,800 --> 00:13:01,610
But, in fact, really this is a
terrible program to write for

243
00:13:01,610 --> 00:13:02,380
optimization.

244
00:13:02,380 --> 00:13:04,720
But it's good didactically.

245
00:13:04,720 --> 00:13:07,460
And Fibonacci numbers
are fun anyway.

246
00:13:07,460 --> 00:13:13,120
So typically what happens is
when you run GCC on your .C

247
00:13:13,120 --> 00:13:19,980
file and produce a binary, what
happens is it produces

248
00:13:19,980 --> 00:13:23,010
the machine code, which is
basically a string of bytes,

249
00:13:23,010 --> 00:13:24,810
zeros and ones.

250
00:13:24,810 --> 00:13:28,570
And that goes, when you run the
program, that goes through

251
00:13:28,570 --> 00:13:30,650
the hardware interpreter.

252
00:13:30,650 --> 00:13:33,430
So the hardware of the machine
is doing an interpretation of

253
00:13:33,430 --> 00:13:37,150
these very simple instructions
and produces an execution.

254
00:13:37,150 --> 00:13:40,980
But, in fact, there's actually
four stages that go on inside

255
00:13:40,980 --> 00:13:44,710
of GCC if you type a
command like this.

256
00:13:44,710 --> 00:13:47,320
The first thing is what's
called preprocessing.

257
00:13:47,320 --> 00:13:51,360
And what that does is it does
any macro expansion and so

258
00:13:51,360 --> 00:13:55,280
forth, things that are just
basically on the level of

259
00:13:55,280 --> 00:13:57,800
textual substitutions
before you get into

260
00:13:57,800 --> 00:13:59,210
the guts of the compiler.

261
00:13:59,210 --> 00:14:02,310
Then you actually
do the compiler.

262
00:14:02,310 --> 00:14:07,930
And that produces a version of
machine code called assembly

263
00:14:07,930 --> 00:14:11,140
language, which we'll see
in just a minute.

264
00:14:11,140 --> 00:14:16,940
And from that version of
assembly language it then goes

265
00:14:16,940 --> 00:14:21,390
into a process called linking
and loading, which actually

266
00:14:21,390 --> 00:14:24,280
causes it to produce
the binary that

267
00:14:24,280 --> 00:14:27,510
you can then execute.

268
00:14:27,510 --> 00:14:30,350
So all four stages are
included here.

269
00:14:30,350 --> 00:14:36,070
And there are switches to GCC
that let you do only one or

270
00:14:36,070 --> 00:14:37,430
all of these things.

271
00:14:37,430 --> 00:14:40,360
You can, for example, run
the preprocessor GCC.

272
00:14:40,360 --> 00:14:43,050
You can tell it to run the
preprocessor alone and see

273
00:14:43,050 --> 00:14:46,180
what all your macros
expanded to.

274
00:14:46,180 --> 00:14:46,410
Yes?

275
00:14:46,410 --> 00:14:47,265
Question?

276
00:14:47,265 --> 00:14:48,477
AUDIENCE: What's the difference
between compiling

277
00:14:48,477 --> 00:14:50,180
and assembling?

278
00:14:50,180 --> 00:14:53,000
CHARLES LEISERSON: So compiling
reduces it to

279
00:14:53,000 --> 00:14:55,090
essentially assembly language.

280
00:14:55,090 --> 00:15:00,900
And then assembling is taking
that assembly language and

281
00:15:00,900 --> 00:15:04,713
producing the machine binary.

282
00:15:04,713 --> 00:15:06,162
AUDIENCE: I was going to say,
there's a one-to-one

283
00:15:06,162 --> 00:15:09,060
correspondence between machine
code and assembly, but there's

284
00:15:09,060 --> 00:15:10,750
not a one-to-one correspondence
between C code

285
00:15:10,750 --> 00:15:11,958
and assembly.

286
00:15:11,958 --> 00:15:12,450
CHARLES LEISERSON: Yeah.

287
00:15:12,450 --> 00:15:14,850
So there's actually not
quite a one to one,

288
00:15:14,850 --> 00:15:17,020
but it's very close.

289
00:15:17,020 --> 00:15:19,060
It's very close.

290
00:15:19,060 --> 00:15:21,720
So you can think of it as one
to one between assembly

291
00:15:21,720 --> 00:15:22,180
machine code.

292
00:15:22,180 --> 00:15:26,390
But assembly is, in some sense,
a more human readable

293
00:15:26,390 --> 00:15:29,890
and understandable version
of machine code.

294
00:15:29,890 --> 00:15:31,760
In fact, that's what we're
going to talk about.

295
00:15:31,760 --> 00:15:34,210
So let's go directly
to assembly code.

296
00:15:34,210 --> 00:15:38,530
To do that I can use
the minus S switch.

297
00:15:38,530 --> 00:15:39,970
Now it turns out it's
also helpful to

298
00:15:39,970 --> 00:15:41,680
use the minus G switch.

299
00:15:41,680 --> 00:15:46,840
Minus G says give me all the
debugger symbol tables.

300
00:15:46,840 --> 00:15:48,880
And what that makes it is so
that you can actually read the

301
00:15:48,880 --> 00:15:50,360
assembly language.

302
00:15:50,360 --> 00:15:52,980
If you don't have that
information, then you don't

303
00:15:52,980 --> 00:15:56,560
know what the programmer
wrote as the symbols.

304
00:15:56,560 --> 00:16:03,050
Instead you will get
computer generated

305
00:16:03,050 --> 00:16:04,620
symbol names for things.

306
00:16:04,620 --> 00:16:07,180
And you don't have any
meaning to those.

307
00:16:07,180 --> 00:16:08,900
So it's really a good
idea to use minus

308
00:16:08,900 --> 00:16:10,800
G and minus S together.

309
00:16:10,800 --> 00:16:13,700
And this basically provides
a convenient symbolic

310
00:16:13,700 --> 00:16:15,340
representation of the
machine language.

311
00:16:15,340 --> 00:16:17,730
And this is sort of the type
of thing that you'll get,

312
00:16:17,730 --> 00:16:20,080
something coming out that
looks like this.

313
00:16:20,080 --> 00:16:21,800
It's basically an Ascii.

314
00:16:21,800 --> 00:16:26,140
It's in text, characters,
rather than being in the

315
00:16:26,140 --> 00:16:28,710
binary executable.

316
00:16:28,710 --> 00:16:34,770
And if you want, you can find
out all the vagaries of it.

317
00:16:34,770 --> 00:16:40,690
This is one site that has some
reasonable documentation on

318
00:16:40,690 --> 00:16:42,560
the GNU assembler.

319
00:16:42,560 --> 00:16:45,950
It's actually not as good on
the instructions, but it's

320
00:16:45,950 --> 00:16:49,330
really good on all the
directives, which we'll talk

321
00:16:49,330 --> 00:16:53,730
about in a minute like .global
and .type and all that stuff.

322
00:16:53,730 --> 00:16:54,980
It's very good on that stuff.

323
00:16:54,980 --> 00:16:58,520

324
00:16:58,520 --> 00:17:00,260
There's another thing
that you can do.

325
00:17:00,260 --> 00:17:04,220
And once again, it's also
helpful if you've produced a

326
00:17:04,220 --> 00:17:06,569
binary that has the
symbol table.

327
00:17:06,569 --> 00:17:11,760
And that is to do a dump
of the object code.

328
00:17:11,760 --> 00:17:16,030
And when you do a dump of the
object code, what it does is

329
00:17:16,030 --> 00:17:19,079
you basically give it an
executable and it goes

330
00:17:19,079 --> 00:17:24,020
backwards the other way, take
this executable and undo one

331
00:17:24,020 --> 00:17:27,800
step, disassemble it.

332
00:17:27,800 --> 00:17:31,150
And what's good about object
dump is that it gives you,

333
00:17:31,150 --> 00:17:33,710
first of all, these are all
the byte codes of the

334
00:17:33,710 --> 00:17:34,990
instructions.

335
00:17:34,990 --> 00:17:38,750
Also if you've got the minus S
says interleave the source

336
00:17:38,750 --> 00:17:42,060
code, so you can see, here's the
source code interleaved.

337
00:17:42,060 --> 00:17:44,070
So you can see which
regions of code

338
00:17:44,070 --> 00:17:47,280
depend on which things.

339
00:17:47,280 --> 00:17:50,600
And so it basically tells you
where in memory it's being

340
00:17:50,600 --> 00:17:53,970
loaded, it's been loaded, what
the instructions are.

341
00:17:53,970 --> 00:17:57,560
And then it gives you the
assembly interpretation of

342
00:17:57,560 --> 00:17:59,230
that machine binary.

343
00:17:59,230 --> 00:18:02,600
And this is where you can see
it's almost one to one what's

344
00:18:02,600 --> 00:18:03,180
going on here.

345
00:18:03,180 --> 00:18:05,295
Here we have a push
of an operand.

346
00:18:05,295 --> 00:18:07,620
And that notice is just
a one byte code.

347
00:18:07,620 --> 00:18:13,330
Whereas here we've got an opcode
and two arguments.

348
00:18:13,330 --> 00:18:16,690
And it has three bytes
as it turns out.

349
00:18:16,690 --> 00:18:18,640
So you can see there's sort
of a correspondence.

350
00:18:18,640 --> 00:18:20,208
Yeah, question?

351
00:18:20,208 --> 00:18:21,196
AUDIENCE: How does
[? logic then take the ?]

352
00:18:21,196 --> 00:18:24,160
machine language code
and go to--

353
00:18:24,160 --> 00:18:27,124
how does it know the function
names and stuff?

354
00:18:27,124 --> 00:18:28,820
CHARLES LEISERSON: It knows
the function names because

355
00:18:28,820 --> 00:18:33,110
when you compile it with -g,
it produces, in addition to

356
00:18:33,110 --> 00:18:36,250
producing the binary, it
produces a separate segment

357
00:18:36,250 --> 00:18:39,870
that's not loaded in that has
all that information that

358
00:18:39,870 --> 00:18:43,030
says, oh, at this location
is where this symbol is.

359
00:18:43,030 --> 00:18:46,840
And it produces all that as
stuff that's never loaded in

360
00:18:46,840 --> 00:18:49,700
at run time but which
is there in order to

361
00:18:49,700 --> 00:18:51,290
aid debuggers mainly.

362
00:18:51,290 --> 00:18:52,610
Question?

363
00:18:52,610 --> 00:18:56,370
AUDIENCE: To compile something
not using the gflag and then

364
00:18:56,370 --> 00:18:58,715
you do an object dump,
how would that work?

365
00:18:58,715 --> 00:19:01,430
CHARLES LEISERSON: Then what
happens is, first of all, you

366
00:19:01,430 --> 00:19:06,280
would not be able to get
this stuff interleaved.

367
00:19:06,280 --> 00:19:09,720
And then things like here where
it says fib, well, fib

368
00:19:09,720 --> 00:19:14,160
may be an external name so
you might know it anyway.

369
00:19:14,160 --> 00:19:16,970
But if it were an internal name,
you would not be able to

370
00:19:16,970 --> 00:19:18,800
see what it was.

371
00:19:18,800 --> 00:19:21,360
Yeah, if you're going
to respond let's

372
00:19:21,360 --> 00:19:22,660
get you on mike here.

373
00:19:22,660 --> 00:19:24,920
Why don't you just hold this?

374
00:19:24,920 --> 00:19:27,270
JOHN: Yeah, so you'll generally
get the function

375
00:19:27,270 --> 00:19:30,370
names so you know roughly
a huge blob of assembly

376
00:19:30,370 --> 00:19:32,100
corresponds to a function.

377
00:19:32,100 --> 00:19:34,740
But you won't be able to get
any information about what

378
00:19:34,740 --> 00:19:39,520
variables are in which registers
or what position the

379
00:19:39,520 --> 00:19:41,630
sixth line of assembly
corresponds to in terms of

380
00:19:41,630 --> 00:19:42,880
your source code.

381
00:19:42,880 --> 00:19:45,580

382
00:19:45,580 --> 00:19:47,170
CHARLES LEISERSON: Then the
other thing that you can do is

383
00:19:47,170 --> 00:19:50,690
you can actually take the
assembler, the assembly code,

384
00:19:50,690 --> 00:19:53,140
if you produce just the assembly
code, and if you tell

385
00:19:53,140 --> 00:19:57,540
GCC to take a .s file, which is
the assembly code, you can

386
00:19:57,540 --> 00:20:00,146
produce the machine
code from it.

387
00:20:00,146 --> 00:20:05,910
And so one thing that you can
do is you can produce a .s

388
00:20:05,910 --> 00:20:11,820
file and then edit it in Emacs
or VI or whatever your

389
00:20:11,820 --> 00:20:16,180
favorite text editor is and
then assemble it with GCC.

390
00:20:16,180 --> 00:20:18,430
So you can actually
modify what the

391
00:20:18,430 --> 00:20:21,480
machine code is directly.

392
00:20:21,480 --> 00:20:23,220
And that's what we're going to
spend a little bit of time

393
00:20:23,220 --> 00:20:24,580
doing today.

394
00:20:24,580 --> 00:20:28,220
Let's go in and see what the
compiler generates and then

395
00:20:28,220 --> 00:20:29,470
let's twiddle it a bit.

396
00:20:29,470 --> 00:20:34,630

397
00:20:34,630 --> 00:20:37,760
So here's what we're going to
expect that you do, that

398
00:20:37,760 --> 00:20:39,010
you're able to do.

399
00:20:39,010 --> 00:20:41,770

400
00:20:41,770 --> 00:20:45,320
We expect in this class that
you're going to be able to

401
00:20:45,320 --> 00:20:50,370
understand how a compiler
implements the C linguistic

402
00:20:50,370 --> 00:20:52,575
constructs using x86
instructions.

403
00:20:52,575 --> 00:20:55,320

404
00:20:55,320 --> 00:21:00,060
We're going to expect that you
can read x86 assembly language

405
00:21:00,060 --> 00:21:01,520
with the aid of a manual.

406
00:21:01,520 --> 00:21:03,830
We don't expect that you know
all the instructions, but the

407
00:21:03,830 --> 00:21:08,350
basic ones we expect that
you know what those are.

408
00:21:08,350 --> 00:21:10,370
We expect that you're going
to be able to make simple

409
00:21:10,370 --> 00:21:14,840
modifications to the assembly
language generated by a

410
00:21:14,840 --> 00:21:19,200
compiler, and that you would
know, if push came to shove,

411
00:21:19,200 --> 00:21:21,610
how to write your own machine
code on your own.

412
00:21:21,610 --> 00:21:23,700
That's not something we're going
to expect that you do,

413
00:21:23,700 --> 00:21:26,920
but you would know how to get
started to do that if at some

414
00:21:26,920 --> 00:21:28,310
point you said, oh,
I really have to

415
00:21:28,310 --> 00:21:29,560
write this in assembler.

416
00:21:29,560 --> 00:21:32,410

417
00:21:32,410 --> 00:21:35,150
So this is, as I say, really
we're going to take off some

418
00:21:35,150 --> 00:21:40,036
layers of the onion today, try
to get down what's going on.

419
00:21:40,036 --> 00:21:44,420
It turns out this is actually
kind of fun.

420
00:21:44,420 --> 00:21:49,100
Now, the part that's not fun
at some level is the x86 64

421
00:21:49,100 --> 00:21:50,350
machine model.

422
00:21:50,350 --> 00:21:53,670

423
00:21:53,670 --> 00:21:55,680
The x86 is what's
called a complex

424
00:21:55,680 --> 00:21:59,530
instruction set computer.

425
00:21:59,530 --> 00:22:06,590
And these, long ago, were
demonstrated to be inferior to

426
00:22:06,590 --> 00:22:10,940
so-called reduced instruction
set computers.

427
00:22:10,940 --> 00:22:14,050
But that hasn't mattered
in the marketplace.

428
00:22:14,050 --> 00:22:16,290
What's mattered in the
marketplace is who could build

429
00:22:16,290 --> 00:22:18,240
better and faster chips.

430
00:22:18,240 --> 00:22:22,940
And also the amount of people
who started using the x86

431
00:22:22,940 --> 00:22:28,240
instruction set has produced
a huge legacy and inertia.

432
00:22:28,240 --> 00:22:30,860
It's sort of like some people
might argue that Esperanto is

433
00:22:30,860 --> 00:22:32,140
a better language for everybody

434
00:22:32,140 --> 00:22:34,480
to learn than English.

435
00:22:34,480 --> 00:22:38,350
But how come English with all
its complexities and so forth,

436
00:22:38,350 --> 00:22:42,500
and I'm sure for some of you who
have learned English as a

437
00:22:42,500 --> 00:22:45,370
second language, it's like
it's a crazy language.

438
00:22:45,370 --> 00:22:46,640
Who do you learn English?

439
00:22:46,640 --> 00:22:50,040
Well, it's because that's what
everybody's learning.

440
00:22:50,040 --> 00:22:52,560
That's where the legacy is.

441
00:22:52,560 --> 00:22:57,130
And so x86 is very much
like the English of

442
00:22:57,130 --> 00:22:58,920
machines these days.

443
00:22:58,920 --> 00:23:05,030
So in this model there's
basically a flat 64-bit

444
00:23:05,030 --> 00:23:06,280
address space.

445
00:23:06,280 --> 00:23:09,510

446
00:23:09,510 --> 00:23:14,710
There are 16 64-bit general
purpose registers, and then

447
00:23:14,710 --> 00:23:18,800
what are some segment registers,
a register full of

448
00:23:18,800 --> 00:23:27,910
flags, an instruction pointer
register, rest in peace.

449
00:23:27,910 --> 00:23:33,170
They're eight 80-bit floating
point data registers, some

450
00:23:33,170 --> 00:23:37,850
control status registers, an
opcode register, a floating

451
00:23:37,850 --> 00:23:40,540
point instruction pointer
register, and a floating point

452
00:23:40,540 --> 00:23:47,000
data pointing register, some
MMX registers for the

453
00:23:47,000 --> 00:23:56,770
multimedia extensions, and a
128-bit XMM registers for the

454
00:23:56,770 --> 00:24:01,500
SSE instructions, which are the
ability to have an opcode

455
00:24:01,500 --> 00:24:05,040
run over several pieces of data
at once, short vectors,

456
00:24:05,040 --> 00:24:09,440
vector instructions, and a
32-bit register that frankly I

457
00:24:09,440 --> 00:24:13,410
don't have a clue as
to what it does.

458
00:24:13,410 --> 00:24:16,010
So, fortunately, we don't
have to know all these.

459
00:24:16,010 --> 00:24:17,930
You can look at the architecture
manual if any of

460
00:24:17,930 --> 00:24:20,250
these become important.

461
00:24:20,250 --> 00:24:24,170
So our goal is not to memorize
the x86 instruction set.

462
00:24:24,170 --> 00:24:30,390
That would be a punishment
probably worse than death.

463
00:24:30,390 --> 00:24:33,175
The only thing worse would
be learning all of C++.

464
00:24:33,175 --> 00:24:36,960

465
00:24:36,960 --> 00:24:40,800
So here's the general
registers.

466
00:24:40,800 --> 00:24:44,260
So there are basically
64-bit registers.

467
00:24:44,260 --> 00:24:46,090
And here's the mnemonics
that they have.

468
00:24:46,090 --> 00:24:48,370
So you can see is all very
mnemonic, right?

469
00:24:48,370 --> 00:24:50,110
We got some of them
that are numbered.

470
00:24:50,110 --> 00:24:51,560
How come they're all
just not numbered?

471
00:24:51,560 --> 00:24:53,670
I mean come on, right?

472
00:24:53,670 --> 00:24:54,150
I know why.

473
00:24:54,150 --> 00:24:54,880
I know why.

474
00:24:54,880 --> 00:24:56,130
Don't tell me.

475
00:24:56,130 --> 00:24:58,660

476
00:24:58,660 --> 00:25:01,740
So what you get to do is look at
and remember that there are

477
00:25:01,740 --> 00:25:06,240
all these fun registers.

478
00:25:06,240 --> 00:25:11,520
And what they did is the x86
64 architecture grew out of

479
00:25:11,520 --> 00:25:14,730
the x86 which was 32-bit.

480
00:25:14,730 --> 00:25:18,340
Well, in fact, originally
it was 16-bit.

481
00:25:18,340 --> 00:25:21,930
And it's been extended twice
to have more bits in the

482
00:25:21,930 --> 00:25:23,580
instruction word so
that now it's a

483
00:25:23,580 --> 00:25:25,730
64-bit instruction word.

484
00:25:25,730 --> 00:25:28,620
And what they did in order to
make it so that they could run

485
00:25:28,620 --> 00:25:32,750
legacy code more easily, which
might have been written with a

486
00:25:32,750 --> 00:25:37,080
smaller word size, is they've
overlap so that the EAX

487
00:25:37,080 --> 00:25:39,370
register, for example,
is the low order

488
00:25:39,370 --> 00:25:43,740
32-bits of the RAX register.

489
00:25:43,740 --> 00:25:47,320
So what you do is you'll see
that R is the prefix that

490
00:25:47,320 --> 00:25:52,182
says, hey, that's a
64-bit register.

491
00:25:52,182 --> 00:25:56,380
E is the prefix that
says that it is--

492
00:25:56,380 --> 00:25:57,920
Whoops, I made a
mistake there.

493
00:25:57,920 --> 00:25:59,250
Those should all be Es.

494
00:25:59,250 --> 00:26:03,530

495
00:26:03,530 --> 00:26:05,200
Oh, no, sorry, no, there's
are correct.

496
00:26:05,200 --> 00:26:08,310
These are R and then with
D because these are the

497
00:26:08,310 --> 00:26:09,760
extended ones, yes.

498
00:26:09,760 --> 00:26:17,410
So these are D. So R and D, that
means also that it's 16.

499
00:26:17,410 --> 00:26:20,280
So you can see just how easy
this is to remember without a

500
00:26:20,280 --> 00:26:21,670
cheat sheet, right?

501
00:26:21,670 --> 00:26:23,540
And then you go down
to 15, et cetera.

502
00:26:23,540 --> 00:26:25,980
And so you can go all the way
down to byte naming, the low

503
00:26:25,980 --> 00:26:28,510
order byte of the registers.

504
00:26:28,510 --> 00:26:31,450
In addition, it turns out
that that's not all.

505
00:26:31,450 --> 00:26:36,150
But the high order byte of the
16-bit registers are also

506
00:26:36,150 --> 00:26:40,365
available as independently
named registers.

507
00:26:40,365 --> 00:26:45,830

508
00:26:45,830 --> 00:26:49,590
When you're using this in a C
program, there's a convention

509
00:26:49,590 --> 00:26:50,600
that C has.

510
00:26:50,600 --> 00:26:54,570
And it's actually different
on Windows from on Linux.

511
00:26:54,570 --> 00:26:57,450
Because there's no reason they
should make those things

512
00:26:57,450 --> 00:26:58,010
compatible.

513
00:26:58,010 --> 00:27:01,040
That would be too easy.

514
00:27:01,040 --> 00:27:02,460
So instead they have
different ones.

515
00:27:02,460 --> 00:27:07,480
But the ones on Linux, this is
essentially the structure.

516
00:27:07,480 --> 00:27:10,810
What happens when you call a
subroutine is generally you're

517
00:27:10,810 --> 00:27:15,250
passing the arguments to the
subroutine in registers.

518
00:27:15,250 --> 00:27:17,490
And in fact the first
six arguments are

519
00:27:17,490 --> 00:27:19,360
passed in these registers.

520
00:27:19,360 --> 00:27:22,790
RDI, you'll get very
familiar with RDI.

521
00:27:22,790 --> 00:27:26,660
Because that's where the first
argument is always passed.

522
00:27:26,660 --> 00:27:30,530
And almost all your functions
will have a first argument,

523
00:27:30,530 --> 00:27:33,880
except for the ones that have
no arguments, and then the

524
00:27:33,880 --> 00:27:36,080
second arguments, the third,
and so forth, and

525
00:27:36,080 --> 00:27:37,890
then fifth and sixth.

526
00:27:37,890 --> 00:27:41,550
If you get more than six, then
it turns out, then you start

527
00:27:41,550 --> 00:27:44,120
passing arguments
through memory.

528
00:27:44,120 --> 00:27:47,120
But otherwise the convention
is that the arguments are

529
00:27:47,120 --> 00:27:49,650
passed through registers.

530
00:27:49,650 --> 00:27:52,220
There are a couple of other
important registers.

531
00:27:52,220 --> 00:27:59,320
One here is the return value
always comes back in RAX.

532
00:27:59,320 --> 00:28:03,840
So when a function returns,
boom, that's where, RAX is

533
00:28:03,840 --> 00:28:07,210
where the value of
the return is.

534
00:28:07,210 --> 00:28:11,440
There is a base pointer and a
stack pointer which give you

535
00:28:11,440 --> 00:28:15,900
the stack frame so that when you
do a push and want to push

536
00:28:15,900 --> 00:28:18,880
local variables those are
telling you the limits of your

537
00:28:18,880 --> 00:28:19,520
local variats.

538
00:28:19,520 --> 00:28:21,780
And we'll talk more about
that a little bit.

539
00:28:21,780 --> 00:28:23,390
And then there are a variety
of other ones.

540
00:28:23,390 --> 00:28:28,880
Some are callee saved and
some are caller saved.

541
00:28:28,880 --> 00:28:30,880
And you can refer
to this chart.

542
00:28:30,880 --> 00:28:36,000
And there are others similar to
it in the various manuals.

543
00:28:36,000 --> 00:28:39,730
Now, it gets pretty confusing,
if this isn't confusing enough

544
00:28:39,730 --> 00:28:40,980
for the naming.

545
00:28:40,980 --> 00:28:42,960

546
00:28:42,960 --> 00:28:45,540
Let's go on to how you
name data types.

547
00:28:45,540 --> 00:28:47,530
And I think some of you have
already experienced this a

548
00:28:47,530 --> 00:28:50,960
little bit, the beauties
of the data types.

549
00:28:50,960 --> 00:28:56,410
So in C, they have all these
different data types such as

550
00:28:56,410 --> 00:28:58,580
I'm listing here.

551
00:28:58,580 --> 00:29:02,100
And if you want to generate a
constant of that size, so

552
00:29:02,100 --> 00:29:06,570
sometimes the compiler will
coerce a value from one type

553
00:29:06,570 --> 00:29:07,330
to another.

554
00:29:07,330 --> 00:29:09,220
But sometimes it won't.

555
00:29:09,220 --> 00:29:11,520
And so if you want to have a
constant, and I've just given

556
00:29:11,520 --> 00:29:14,410
a couple things here, for
example, if you want it to be

557
00:29:14,410 --> 00:29:18,710
just an int, you can just
write the number.

558
00:29:18,710 --> 00:29:20,350
But if you want it to be
unsigned, you have to

559
00:29:20,350 --> 00:29:22,000
put a U after it.

560
00:29:22,000 --> 00:29:23,620
Or if you want it to be
a long, you have to

561
00:29:23,620 --> 00:29:26,210
put an L after it.

562
00:29:26,210 --> 00:29:29,390
And for many things it'll get
coerced automatically to the

563
00:29:29,390 --> 00:29:32,320
right type because if you do
an operator with another

564
00:29:32,320 --> 00:29:35,100
argument it will be coerced
to that type.

565
00:29:35,100 --> 00:29:37,490
But some of you got burned on
some of the shift things to

566
00:29:37,490 --> 00:29:44,570
begin with because it wasn't
clear what exactly the sizes.

567
00:29:44,570 --> 00:29:50,230
Well, you can be explicit in
C and name them using this

568
00:29:50,230 --> 00:29:52,690
particular convention.

569
00:29:52,690 --> 00:29:57,100
This tells you how many bytes
are being allocated for that

570
00:29:57,100 --> 00:30:03,110
type in the x86 64 size.

571
00:30:03,110 --> 00:30:04,550
So it's [? veted ?]
here for four.

572
00:30:04,550 --> 00:30:06,150
Now, long double
is a funny one.

573
00:30:06,150 --> 00:30:08,910
It's actually allocate
16 bytes, but only

574
00:30:08,910 --> 00:30:11,740
10 of them are used.

575
00:30:11,740 --> 00:30:15,500
So basically there are six bytes
that get unused by that.

576
00:30:15,500 --> 00:30:19,960
And I think that's for future
expansion so that they can

577
00:30:19,960 --> 00:30:21,700
have even wider extension.

578
00:30:21,700 --> 00:30:23,480
This is generally used, of
course, for floating

579
00:30:23,480 --> 00:30:25,010
point and so forth.

580
00:30:25,010 --> 00:30:28,480
Now, in the assembly language,
each of the

581
00:30:28,480 --> 00:30:34,125
operators has a suffix.

582
00:30:34,125 --> 00:30:37,060
And sometimes, if it's a two
operand instruction, it may,

583
00:30:37,060 --> 00:30:39,820
where it's taking things of
different sizes, it may have

584
00:30:39,820 --> 00:30:40,780
two suffixes.

585
00:30:40,780 --> 00:30:44,110
But it has a suffix which is a
single character that tells

586
00:30:44,110 --> 00:30:52,050
you what the size is that
you're working with.

587
00:30:52,050 --> 00:30:55,310
So, for example,
B is for byte.

588
00:30:55,310 --> 00:30:57,980
W is for word because
originally the

589
00:30:57,980 --> 00:31:01,421
words were 16 bits.

590
00:31:01,421 --> 00:31:05,580
L is for long except that
it's not a long

591
00:31:05,580 --> 00:31:07,040
so don't get confused.

592
00:31:07,040 --> 00:31:08,680
L is not long.

593
00:31:08,680 --> 00:31:15,180
Long is a quad word,
or Q, four bytes.

594
00:31:15,180 --> 00:31:19,230
And then a float is an S. A
double is a D. And a long

595
00:31:19,230 --> 00:31:22,100
double is a T.

596
00:31:22,100 --> 00:31:24,920
So these you will get
familiar with.

597
00:31:24,920 --> 00:31:26,160
And they're not so hard.

598
00:31:26,160 --> 00:31:28,290
But that doesn't mean you know
them right off the bat.

599
00:31:28,290 --> 00:31:29,650
And it helps to have
a cheat sheet.

600
00:31:29,650 --> 00:31:32,860

601
00:31:32,860 --> 00:31:36,550
As I say, the main one not to
get confused about is the Ls.

602
00:31:36,550 --> 00:31:42,630
L means something different
in x86 than it means in C.

603
00:31:42,630 --> 00:31:49,250
So, for example, here we have a
move of, and because it's a

604
00:31:49,250 --> 00:31:54,540
Q, I know that it is an eight
byte or a 64-bit operator.

605
00:31:54,540 --> 00:31:57,615
And you can tell that also
because it's using RBP and

606
00:31:57,615 --> 00:32:02,630
RAX, both of which are
64-bit registers.

607
00:32:02,630 --> 00:32:05,330
In fact, in assembly, you can
actually write it without the

608
00:32:05,330 --> 00:32:13,420
Q, because the assembler can
infer when the Q isn't there

609
00:32:13,420 --> 00:32:18,060
that, oh, this is a 64-bit
register, that's a 64-bit

610
00:32:18,060 --> 00:32:21,700
register, I bet he means
move 64-bits.

611
00:32:21,700 --> 00:32:24,020
So it actually fills
that in sometimes.

612
00:32:24,020 --> 00:32:27,220
But sometimes you need
to be explicit.

613
00:32:27,220 --> 00:32:27,580
Question?

614
00:32:27,580 --> 00:32:30,418
AUDIENCE: What happens when
you actually put 64-bit

615
00:32:30,418 --> 00:32:33,640
registers but you only, and
you just put move [? b ?]

616
00:32:33,640 --> 00:32:34,020
or something?

617
00:32:34,020 --> 00:32:34,375
Would it complain?

618
00:32:34,375 --> 00:32:36,490
Would it [UNINTELLIGIBLE]?

619
00:32:36,490 --> 00:32:37,980
CHARLES LEISERSON: Yeah,
it would complain.

620
00:32:37,980 --> 00:32:38,910
Yeah, it would complain.

621
00:32:38,910 --> 00:32:46,750
it'll say it's an improperly
formed instruction, so, yeah.

622
00:32:46,750 --> 00:32:48,110
And the other thing you
can do, of course,

623
00:32:48,110 --> 00:32:50,430
is just try it out.

624
00:32:50,430 --> 00:32:51,920
What happens if?

625
00:32:51,920 --> 00:32:53,200
That's the great thing
about computers.

626
00:32:53,200 --> 00:32:56,140
It's easy to do what
happened if.

627
00:32:56,140 --> 00:33:00,680
Now, the instruction format is
typically an opcode followed

628
00:33:00,680 --> 00:33:03,840
by an operand list.

629
00:33:03,840 --> 00:33:06,760
So the opcode is a short
mnemonic identifying the type

630
00:33:06,760 --> 00:33:12,230
of instruction that includes
typically the single character

631
00:33:12,230 --> 00:33:15,390
suffix indicating
the data type.

632
00:33:15,390 --> 00:33:18,570
However, for some instructions
it turns out you can have two

633
00:33:18,570 --> 00:33:20,610
suffixes if the two--

634
00:33:20,610 --> 00:33:25,670
Most instructions operate on
data types of the same size.

635
00:33:25,670 --> 00:33:31,300
But some of them operate on two
different sizes in which

636
00:33:31,300 --> 00:33:33,130
case you'll have two suffixes.

637
00:33:33,130 --> 00:33:35,480
If the suffix is missing, it can
generally be inferred, as

638
00:33:35,480 --> 00:33:36,710
I mentioned.

639
00:33:36,710 --> 00:33:41,580
Then the operand list is from
zero, two, and very rarely

640
00:33:41,580 --> 00:33:45,840
three operands separated
by commas.

641
00:33:45,840 --> 00:33:50,090
Now, in the architecture manual,
in fact, they say if

642
00:33:50,090 --> 00:33:54,790
you look at it, they'll show
you fourth operand.

643
00:33:54,790 --> 00:33:56,260
And I said, four operands?

644
00:33:56,260 --> 00:33:58,270
This documentation says
there's only three.

645
00:33:58,270 --> 00:34:00,460
This one says there's four.

646
00:34:00,460 --> 00:34:01,520
I went through the whole

647
00:34:01,520 --> 00:34:03,170
architecture manual last night.

648
00:34:03,170 --> 00:34:06,660

649
00:34:06,660 --> 00:34:10,210
Every time it says four
operands, it says N/A, not

650
00:34:10,210 --> 00:34:11,500
applicable.

651
00:34:11,500 --> 00:34:14,540
So I think it's just there
reserved or something.

652
00:34:14,540 --> 00:34:20,070
But anyway there is no fourth
operand as far as I can tell.

653
00:34:20,070 --> 00:34:25,690
Now, one of the operands
is the destination.

654
00:34:25,690 --> 00:34:28,489
And here's where we start to
get into some differences.

655
00:34:28,489 --> 00:34:32,139
There's actually two standard
formats for assembly language

656
00:34:32,139 --> 00:34:36,630
that are generally called Intel
and AT&T. So AT&T was

657
00:34:36,630 --> 00:34:41,469
the original Unix system.

658
00:34:41,469 --> 00:34:45,969
And Intel is what Intel uses
for their assembler.

659
00:34:45,969 --> 00:34:51,389
They do the destination operand
in the opposite order.

660
00:34:51,389 --> 00:34:55,000
So AT&T, it puts the
destination last.

661
00:34:55,000 --> 00:34:57,910
In Intel it puts the
destination first.

662
00:34:57,910 --> 00:35:00,720
So when you're reading
documentation, you can read

663
00:35:00,720 --> 00:35:01,595
the Intel documentation.

664
00:35:01,595 --> 00:35:04,020
You just have to remember to
flip it around if you're

665
00:35:04,020 --> 00:35:07,260
actually writing it as we will
be using the AT&T format.

666
00:35:07,260 --> 00:35:09,860
Almost everybody uses
AT&T as far as I

667
00:35:09,860 --> 00:35:13,370
can tell except Intel.

668
00:35:13,370 --> 00:35:16,740
So Intel's assembler does
it the other way around.

669
00:35:16,740 --> 00:35:20,030
And actually now GCC will
actually, you can give it a

670
00:35:20,030 --> 00:35:28,190
directive to say I'm now
switching to writing it in

671
00:35:28,190 --> 00:35:29,480
Intel assembler.

672
00:35:29,480 --> 00:35:31,450
So you can actually go back and
forth between the two if

673
00:35:31,450 --> 00:35:34,080
you happen to borrow some
assembly language code from

674
00:35:34,080 --> 00:35:35,510
somebody else.

675
00:35:35,510 --> 00:35:37,620
So one of them is
the destination.

676
00:35:37,620 --> 00:35:42,680
The other operations are
read-only, so const in the C++

677
00:35:42,680 --> 00:35:44,230
terminology.

678
00:35:44,230 --> 00:35:45,540
They're read-only.

679
00:35:45,540 --> 00:35:49,140
So it's always the case that
only one of them is going to

680
00:35:49,140 --> 00:35:50,160
be modified.

681
00:35:50,160 --> 00:35:54,770
And that's the one that's the
destination of the operation.

682
00:35:54,770 --> 00:35:56,730
In addition in assembler, there
are what are called

683
00:35:56,730 --> 00:35:57,570
directives.

684
00:35:57,570 --> 00:36:01,840
Besides the instructions,
there are directives.

685
00:36:01,840 --> 00:36:03,740
So first of all there are
things like labels.

686
00:36:03,740 --> 00:36:07,780
You can take any instruction
and put an identifier and a

687
00:36:07,780 --> 00:36:12,200
colon, and that becomes then
a way of naming that

688
00:36:12,200 --> 00:36:14,040
place in your code.

689
00:36:14,040 --> 00:36:16,320
So, for example, jump
instructions want to know to

690
00:36:16,320 --> 00:36:18,480
where they're jumping.

691
00:36:18,480 --> 00:36:23,050
And rather than having to know
upfront what the address is,

692
00:36:23,050 --> 00:36:25,740
the assembler will calculate
what that address is and

693
00:36:25,740 --> 00:36:29,420
everywhere you put x, it'll
put in the right value.

694
00:36:29,420 --> 00:36:33,130
And you get to name it
symbolically rather than as an

695
00:36:33,130 --> 00:36:35,700
absolute machine location.

696
00:36:35,700 --> 00:36:37,180
There are storage directives.

697
00:36:37,180 --> 00:36:41,560
So, for example, .space 20
says allocate 20 bytes at

698
00:36:41,560 --> 00:36:43,270
location x.

699
00:36:43,270 --> 00:36:49,890
.long says store the
constant 172 at y.

700
00:36:49,890 --> 00:36:53,620
It's being stored at y because
I said y is here.

701
00:36:53,620 --> 00:36:57,100
And asciz gives you a string
that's zero terminated.

702
00:36:57,100 --> 00:37:00,720
So the standard for strings
is zero terminated.

703
00:37:00,720 --> 00:37:03,390
You can also, there's one
that says give me a

704
00:37:03,390 --> 00:37:05,630
nonterminated string.

705
00:37:05,630 --> 00:37:08,740
So you can have fun with
that if you like that.

706
00:37:08,740 --> 00:37:12,150
The align directive says make
sure that as you're going

707
00:37:12,150 --> 00:37:14,040
through, so what's happening
is the assembler is going

708
00:37:14,040 --> 00:37:16,640
through there, is it's laying
these things out in memory

709
00:37:16,640 --> 00:37:19,300
typically sequentially, the way
you wrote it down in the

710
00:37:19,300 --> 00:37:23,460
program, in the assembly
language program.

711
00:37:23,460 --> 00:37:26,460
If you say align eight, it
says advance whatever the

712
00:37:26,460 --> 00:37:29,480
pointer is of where the next
thing is going to be put to be

713
00:37:29,480 --> 00:37:31,910
a multiple of eight.

714
00:37:31,910 --> 00:37:34,600
And that way you don't run the
risk of where you declare a

715
00:37:34,600 --> 00:37:39,550
character and then you say, OK,
and now I want a long or

716
00:37:39,550 --> 00:37:43,930
something, and it's not aligned
in a way that the

717
00:37:43,930 --> 00:37:48,700
eight bytes correspond to a
multiple of eight the way you

718
00:37:48,700 --> 00:37:50,300
need to in order for
the instructions to

719
00:37:50,300 --> 00:37:52,890
properly work on them.

720
00:37:52,890 --> 00:37:56,350
So generally, although, we
have byte pointers, most

721
00:37:56,350 --> 00:37:58,445
instructions only work
on aligned values.

722
00:37:58,445 --> 00:38:01,760

723
00:38:01,760 --> 00:38:05,050
And for some of them that work
on unaligned values, they're

724
00:38:05,050 --> 00:38:09,760
generally slower than the ones
that work on aligned values.

725
00:38:09,760 --> 00:38:11,260
There are also segment
directives.

726
00:38:11,260 --> 00:38:16,510
So in memory when you run your
program, the executing program

727
00:38:16,510 --> 00:38:20,810
starts with the program text
down at the bottom of memory.

728
00:38:20,810 --> 00:38:25,830
And then it has fixed data
that's not going to change,

729
00:38:25,830 --> 00:38:27,410
static allocation of data.

730
00:38:27,410 --> 00:38:30,870
And then it's got heap, which is
dynamically allocated data.

731
00:38:30,870 --> 00:38:32,010
And then it's got stack.

732
00:38:32,010 --> 00:38:36,230
The stack grows downward, and
the heap grows upward.

733
00:38:36,230 --> 00:38:39,530
By saying something like text,
it says make sure that the

734
00:38:39,530 --> 00:38:42,010
next stuff I'm putting goes
into the text segment.

735
00:38:42,010 --> 00:38:45,430
So that's generally where
you put your code.

736
00:38:45,430 --> 00:38:47,590
Saying it's in data says make
sure it goes in here.

737
00:38:47,590 --> 00:38:51,500
So you may want to have
a table, for example.

738
00:38:51,500 --> 00:38:54,230
So, for example, from pentominos
you might have a

739
00:38:54,230 --> 00:38:55,470
table there.

740
00:38:55,470 --> 00:38:56,610
There's going to be
a fixed table.

741
00:38:56,610 --> 00:38:57,740
You're never going to
change it during the

742
00:38:57,740 --> 00:38:59,050
running of the program.

743
00:38:59,050 --> 00:39:00,300
Put it in the data segment.

744
00:39:00,300 --> 00:39:03,790

745
00:39:03,790 --> 00:39:06,060
And then there's also things
like scope and linkage

746
00:39:06,060 --> 00:39:06,550
directives.

747
00:39:06,550 --> 00:39:09,640
So saying .global, and you can
either spell it incorrectly,

748
00:39:09,640 --> 00:39:14,310
as I have here, or with the a,
it's the same thing for the

749
00:39:14,310 --> 00:39:16,640
GNU assembler anyway.

750
00:39:16,640 --> 00:39:21,130
It says make the symbol fib
externally visible.

751
00:39:21,130 --> 00:39:23,760
And that makes sure that it goes
into the symbol table so

752
00:39:23,760 --> 00:39:26,910
that debuggers and things
can look at it.

753
00:39:26,910 --> 00:39:29,980
And there's a lot more of these
in the assembler manual,

754
00:39:29,980 --> 00:39:33,140
that link that I showed you to
before, tells you what all the

755
00:39:33,140 --> 00:39:33,940
directives mean.

756
00:39:33,940 --> 00:39:35,880
So that when you're looking at
code, which mostly you'll be

757
00:39:35,880 --> 00:39:38,680
reading it, making a few changes
to it, you can know

758
00:39:38,680 --> 00:39:39,930
what things mean.

759
00:39:39,930 --> 00:39:42,520

760
00:39:42,520 --> 00:39:47,020
So the opcode examples,
here's some examples.

761
00:39:47,020 --> 00:39:49,500
There are things like
mov, push, and pop.

762
00:39:49,500 --> 00:39:53,970
So, for example, here movslq,
this is an interesting one

763
00:39:53,970 --> 00:39:55,150
because it's moving.

764
00:39:55,150 --> 00:40:01,000
The s says extend the sign
because I'm moving from a long

765
00:40:01,000 --> 00:40:09,930
from a 32-bit word to a 64-bit
word, from 4-bits to 8-bits.

766
00:40:09,930 --> 00:40:14,490
So that's why this one takes
two suffixes moving from,

767
00:40:14,490 --> 00:40:17,470
you'll notice, a 32-bit register
to a 64-bit register.

768
00:40:17,470 --> 00:40:20,160

769
00:40:20,160 --> 00:40:21,570
So you have to be careful.

770
00:40:21,570 --> 00:40:24,840
This is something I got caught
up in the other day.

771
00:40:24,840 --> 00:40:27,960
The results of 32-bit operations
are implicitly

772
00:40:27,960 --> 00:40:30,620
extended to 64-bit values.

773
00:40:30,620 --> 00:40:34,330
So if you store something into
EAX, for example, it

774
00:40:34,330 --> 00:40:40,220
automatically zeroes out the
high order 32-bits of RAX.

775
00:40:40,220 --> 00:40:42,130
Because that's the one that
it's embedded in.

776
00:40:42,130 --> 00:40:46,340
However, that's not true for
the eight and 16-bit

777
00:40:46,340 --> 00:40:47,580
operations.

778
00:40:47,580 --> 00:40:52,910
If you store into an 8-bit
field, an 8-bit part of the

779
00:40:52,910 --> 00:40:55,750
register, it does not
zero out the high

780
00:40:55,750 --> 00:40:58,120
order bits of the remainder.

781
00:40:58,120 --> 00:41:02,340
So you just have to be careful
when you're doing that.

782
00:41:02,340 --> 00:41:06,370
Most of these are things, by the
way, that is more cryptic

783
00:41:06,370 --> 00:41:07,250
when you're looking at stuff.

784
00:41:07,250 --> 00:41:10,850
It's like, oh, how come it's,
gee, I thought I had, I'm

785
00:41:10,850 --> 00:41:13,770
returning a double word, but
it looks here like it's

786
00:41:13,770 --> 00:41:16,440
returning a 32-bit word,
how come I thought I

787
00:41:16,440 --> 00:41:18,270
was returning 64--

788
00:41:18,270 --> 00:41:20,060
Well, the answer is because
it knows the high

789
00:41:20,060 --> 00:41:21,310
order bits are zero.

790
00:41:21,310 --> 00:41:23,343
So it's using the shorter
instructions.

791
00:41:23,343 --> 00:41:26,230

792
00:41:26,230 --> 00:41:33,780
And yet it still is having the
impact on the 64-bit register.

793
00:41:33,780 --> 00:41:37,000
They're all the arithmetic
and logical operations.

794
00:41:37,000 --> 00:41:40,810
So subtracting, once again,
the destination is second.

795
00:41:40,810 --> 00:41:43,170
So typically these are
two operator things.

796
00:41:43,170 --> 00:41:45,820
So you always have the
destination occurs both at the

797
00:41:45,820 --> 00:41:49,130
beginning and on the left hand
side and the right hand side,

798
00:41:49,130 --> 00:41:53,027
things like shifts and rotates,
control transfer, so

799
00:41:53,027 --> 00:41:56,190
call which does a subroutine
jump, return from a

800
00:41:56,190 --> 00:42:00,650
subroutine, a jump instruction
that just says make the next

801
00:42:00,650 --> 00:42:03,320
instruction the thing that
you're pointing to, and very

802
00:42:03,320 --> 00:42:05,380
important, the jump conditionals
where the

803
00:42:05,380 --> 00:42:08,460
condition is a whole bunch of
keys that are things like

804
00:42:08,460 --> 00:42:12,420
greater than, less than, and so
forth, and different ones

805
00:42:12,420 --> 00:42:16,400
for signed and unsigned
and so forth.

806
00:42:16,400 --> 00:42:20,210
So typically the condition is
computed by using a compare

807
00:42:20,210 --> 00:42:20,800
instruction.

808
00:42:20,800 --> 00:42:24,570
I probably should have put CNP
on here as well, but I didn't.

809
00:42:24,570 --> 00:42:27,540
But the CNP instruction is
usually what you use to

810
00:42:27,540 --> 00:42:30,900
compare two things and then you
separately jump on what

811
00:42:30,900 --> 00:42:32,960
the condition is.

812
00:42:32,960 --> 00:42:37,330
There's a pretty nice
website that has

813
00:42:37,330 --> 00:42:39,840
most of these opcodes.

814
00:42:39,840 --> 00:42:44,590
However, they only deal
with the old x86

815
00:42:44,590 --> 00:42:46,970
without the 64-bit extension.

816
00:42:46,970 --> 00:42:49,390
And they use the Intel syntax.

817
00:42:49,390 --> 00:42:50,570
But it's really convenient.

818
00:42:50,570 --> 00:42:55,140
Because they've done a nice
job of making a quick jump

819
00:42:55,140 --> 00:42:57,220
table where you can just
go, look up the

820
00:42:57,220 --> 00:42:59,200
opcode, and pop it up.

821
00:42:59,200 --> 00:43:02,560
Otherwise you can just look
at them in the manual.

822
00:43:02,560 --> 00:43:04,420
Anyway, that's kind of
a convenient place.

823
00:43:04,420 --> 00:43:07,180
And, as I say, just beware
because it's 32-bit only, and

824
00:43:07,180 --> 00:43:08,350
it's Intel syntax.

825
00:43:08,350 --> 00:43:10,190
Most of the instructions
got extended.

826
00:43:10,190 --> 00:43:15,240
I mean it's like, OK, if you do
it for eight and 16 and 32,

827
00:43:15,240 --> 00:43:18,720
the operation is not going to
change that much to go to 64.

828
00:43:18,720 --> 00:43:20,780
A few of them do, however.

829
00:43:20,780 --> 00:43:27,520
Now, the operands, Intel
supports, the x86, which is

830
00:43:27,520 --> 00:43:30,820
Intel and AMD, typically,
support all kinds of

831
00:43:30,820 --> 00:43:33,870
addressing modes.

832
00:43:33,870 --> 00:43:36,550
The rule is that only
one operand,

833
00:43:36,550 --> 00:43:41,060
however, can address memory.

834
00:43:41,060 --> 00:43:43,180
So you have to pick which is
the operand that's going to

835
00:43:43,180 --> 00:43:45,600
address memory if you have
multiple operands.

836
00:43:45,600 --> 00:43:47,270
You can't have both operands.

837
00:43:47,270 --> 00:43:50,860
So you can't add two
things in memory.

838
00:43:50,860 --> 00:43:52,800
You always have to take
something for memory and bring

839
00:43:52,800 --> 00:43:56,880
it into a register and then
store it back out.

840
00:43:56,880 --> 00:44:00,610
So the simplest one is two
register instructions.

841
00:44:00,610 --> 00:44:04,080
Here I've basically
marked the--

842
00:44:04,080 --> 00:44:05,030
What have I marked here?

843
00:44:05,030 --> 00:44:07,515
I guess I marked--

844
00:44:07,515 --> 00:44:07,820
I don't know.

845
00:44:07,820 --> 00:44:09,090
Down here I was marking
memory.

846
00:44:09,090 --> 00:44:10,820
I'm not sure what I was
marking up here.

847
00:44:10,820 --> 00:44:12,070
Because they're both
registers.

848
00:44:12,070 --> 00:44:14,270

849
00:44:14,270 --> 00:44:18,710
But in any case, this is just
adding RBX into RAX.

850
00:44:18,710 --> 00:44:21,850
And so it takes the contents
of RBX adds it in

851
00:44:21,850 --> 00:44:23,730
the contents of RAX.

852
00:44:23,730 --> 00:44:26,360
There's something that's
called direct.

853
00:44:26,360 --> 00:44:31,780
So this is, it says, where you
move, x is some constant

854
00:44:31,780 --> 00:44:37,130
value, and you move it, the
contents of it, into RDI.

855
00:44:37,130 --> 00:44:39,270
So if x, for example, is a
location that you've stored a

856
00:44:39,270 --> 00:44:42,570
value in, you can say move
whatever is the value at that

857
00:44:42,570 --> 00:44:46,450
location into RDI.

858
00:44:46,450 --> 00:44:51,240
Immediate says, which usually is
preceded by a dollar sign,

859
00:44:51,240 --> 00:44:54,990
says move the address of it,
move that as a constant.

860
00:44:54,990 --> 00:44:57,970
So x has a value,
move that value.

861
00:44:57,970 --> 00:45:02,290
So if you say $3, then
you'll move the

862
00:45:02,290 --> 00:45:03,740
constant three into RDI.

863
00:45:03,740 --> 00:45:07,690
If you said mov3 this, you're
going to move the contents of

864
00:45:07,690 --> 00:45:11,030
location three in memory.

865
00:45:11,030 --> 00:45:16,250
So that's the difference between
direct and immediate.

866
00:45:16,250 --> 00:45:19,720
So the dollar sign says
you're taking that

867
00:45:19,720 --> 00:45:21,350
as a literal constant.

868
00:45:21,350 --> 00:45:24,610
And the direct says you're
actually going to memory and

869
00:45:24,610 --> 00:45:26,470
fetching it.

870
00:45:26,470 --> 00:45:28,020
Then things start getting
interesting.

871
00:45:28,020 --> 00:45:33,215
Register indirect says, in
this case, the thing that

872
00:45:33,215 --> 00:45:34,900
you're going to access
is the thing

873
00:45:34,900 --> 00:45:38,050
pointed to by that register.

874
00:45:38,050 --> 00:45:42,610
So don't move, in this case, RBX
into RAX, move it to the

875
00:45:42,610 --> 00:45:47,920
memory location that
RAX is pointing to.

876
00:45:47,920 --> 00:45:51,160
Then you can do register index
which says, well, it's

877
00:45:51,160 --> 00:45:57,300
pointing to it, but I want
displaced 172-bytes off of

878
00:45:57,300 --> 00:46:01,930
that location, of whatever
this is pointing to.

879
00:46:01,930 --> 00:46:04,470
So, for example, if you have a
pointer to a record, you can

880
00:46:04,470 --> 00:46:07,050
then have just a single pointer
and address all the

881
00:46:07,050 --> 00:46:10,320
fields just by doing register
indirect to the different

882
00:46:10,320 --> 00:46:11,940
fields using that
same register.

883
00:46:11,940 --> 00:46:15,430

884
00:46:15,430 --> 00:46:17,650
Then there is, it actually--

885
00:46:17,650 --> 00:46:21,220
I skipped, actually, a few in
here that are subsets of this.

886
00:46:21,220 --> 00:46:24,500
This is, I think, the most
complicated one that I know.

887
00:46:24,500 --> 00:46:27,520
It's base index scale
displacement where base and

888
00:46:27,520 --> 00:46:31,100
index are registers, the scale
is two, four, eight, and if

889
00:46:31,100 --> 00:46:33,050
it's not there, it
implies one.

890
00:46:33,050 --> 00:46:37,710
The displacement is eight,
16, or a 32-bit value.

891
00:46:37,710 --> 00:46:45,260
And it says take RD--

892
00:46:45,260 --> 00:46:48,570
oh, I had put the math on here,
and then I guess I lost

893
00:46:48,570 --> 00:46:54,640
it-- it says take RDX, multiply
it by eight, add RDI,

894
00:46:54,640 --> 00:46:57,141
and add 172.

895
00:46:57,141 --> 00:46:58,391
[WHISTLE].

896
00:46:58,391 --> 00:47:00,290

897
00:47:00,290 --> 00:47:02,700
So, anyway, you can look
in the manual.

898
00:47:02,700 --> 00:47:05,860
So you'll see some of these
instructions being generated.

899
00:47:05,860 --> 00:47:08,030
Generally, you're not going to
generate these instructions.

900
00:47:08,030 --> 00:47:10,780
So when you see them generated,
you can see it.

901
00:47:10,780 --> 00:47:14,220
And then, and this is actually
new, it's not in the x86.

902
00:47:14,220 --> 00:47:18,270
It has this instruction pointer
where you can actually

903
00:47:18,270 --> 00:47:22,350
access where the current program
counter is pointing,

904
00:47:22,350 --> 00:47:25,490
where it is in the code, and
store that value, in this case

905
00:47:25,490 --> 00:47:28,660
indexed by six, into RAX.

906
00:47:28,660 --> 00:47:30,760
So you can do it relative
where this has

907
00:47:30,760 --> 00:47:33,200
to be a 32-bit constant.

908
00:47:33,200 --> 00:47:37,210
And what's good about that is
it allows you then to write

909
00:47:37,210 --> 00:47:39,930
code where you can do things
like jump to something that's

910
00:47:39,930 --> 00:47:42,690
relative to the program
counter.

911
00:47:42,690 --> 00:47:46,020
And that lets you put the code
anywhere in memory, and it

912
00:47:46,020 --> 00:47:48,010
still has the same behavior.

913
00:47:48,010 --> 00:47:51,190
Because you're going relative
to where that code is rather

914
00:47:51,190 --> 00:47:54,080
than to an absolute location.

915
00:47:54,080 --> 00:47:56,315
So it allows the code
to be relocatable.

916
00:47:56,315 --> 00:48:00,580

917
00:48:00,580 --> 00:48:02,540
So here's--

918
00:48:02,540 --> 00:48:03,740
Yeah, questions, yeah sure.

919
00:48:03,740 --> 00:48:07,044
AUDIENCE: Why was it the index
registers, when you have the

920
00:48:07,044 --> 00:48:09,876
numbering for the register,
what does that mean again?

921
00:48:09,876 --> 00:48:11,680
CHARLES LEISERSON: The number
before the register?

922
00:48:11,680 --> 00:48:13,099
AUDIENCE: In the instruction
[UNINTELLIGIBLE],

923
00:48:13,099 --> 00:48:15,120
that's 60 for RID.

924
00:48:15,120 --> 00:48:18,610
CHARLES LEISERSON: OK, or
whatever, whenever it's here,

925
00:48:18,610 --> 00:48:22,440
it's basically saying,
add that value to

926
00:48:22,440 --> 00:48:25,900
the contents of RAX.

927
00:48:25,900 --> 00:48:29,490
And so the same thing here, add
six to the contents of the

928
00:48:29,490 --> 00:48:30,050
instruction.

929
00:48:30,050 --> 00:48:35,510
So this is six bytes ahead of me
in the instruction stream.

930
00:48:35,510 --> 00:48:36,360
OK?

931
00:48:36,360 --> 00:48:39,230
So you can actually say, well,
what's that instruction ahead

932
00:48:39,230 --> 00:48:41,380
of me in the instructions
stream?

933
00:48:41,380 --> 00:48:42,630
OK?

934
00:48:42,630 --> 00:48:45,350

935
00:48:45,350 --> 00:48:50,490
So here's some examples of
essentially the same code and

936
00:48:50,490 --> 00:48:53,470
how it gets compiled.

937
00:48:53,470 --> 00:48:57,420
So here we're going to have
a fou1, fou2, fou3.

938
00:48:57,420 --> 00:49:00,690
And in this case we declare
x, y, and z

939
00:49:00,690 --> 00:49:02,910
to be unsigned integers.

940
00:49:02,910 --> 00:49:04,450
We set them to some values.

941
00:49:04,450 --> 00:49:10,900
And we just simply say
return x plus y or z,

942
00:49:10,900 --> 00:49:12,880
bitwise OR with z.

943
00:49:12,880 --> 00:49:16,450
If you look at what the code is
that's generated, it says

944
00:49:16,450 --> 00:49:19,612
move the constant 45 into EAX.

945
00:49:19,612 --> 00:49:22,740

946
00:49:22,740 --> 00:49:23,800
Why does it do that?

947
00:49:23,800 --> 00:49:24,990
Well, let's just see.

948
00:49:24,990 --> 00:49:28,930
Well, the compiler figures out
that it knows what 35, seven,

949
00:49:28,930 --> 00:49:29,890
and 45 are.

950
00:49:29,890 --> 00:49:31,560
It computes x plus y.

951
00:49:31,560 --> 00:49:32,975
That's 41.

952
00:49:32,975 --> 00:49:37,930
If you take 41 bitwise OR with
45, it turns out it's masking

953
00:49:37,930 --> 00:49:40,040
the same bits, that's 45.

954
00:49:40,040 --> 00:49:43,600
So the compiler actually can
figure this out that all it

955
00:49:43,600 --> 00:49:48,040
has to do is return 45
in a 64-bit register.

956
00:49:48,040 --> 00:49:51,620
Ah, but here it's returning
it in a 32-bit register.

957
00:49:51,620 --> 00:49:52,770
What happened?

958
00:49:52,770 --> 00:49:53,990
It's not obeying the type.

959
00:49:53,990 --> 00:49:56,200
The type is supposed
to be 64-bits, but

960
00:49:56,200 --> 00:49:58,460
that's a 32-bit register.

961
00:49:58,460 --> 00:50:01,400
Oh, yeah, that's this thing
where it automatically zeroes

962
00:50:01,400 --> 00:50:03,540
out the high order bits.

963
00:50:03,540 --> 00:50:07,200
And it uses this instruction,
because this is a shorter

964
00:50:07,200 --> 00:50:10,030
instruction than if
it did the RAX.

965
00:50:10,030 --> 00:50:12,550
It could do the same thing with
RAX, but it would be more

966
00:50:12,550 --> 00:50:13,640
bytes of instruction.

967
00:50:13,640 --> 00:50:17,020
So they saved a couple bytes of
instruction by doing that.

968
00:50:17,020 --> 00:50:20,240
So people follow what
happened there?

969
00:50:20,240 --> 00:50:21,290
Let's take a look
at the next one.

970
00:50:21,290 --> 00:50:23,770
Here it's the same code
just let's pass

971
00:50:23,770 --> 00:50:26,890
those things as arguments.

972
00:50:26,890 --> 00:50:29,280
Well, if you remember the
calling convention, parameter

973
00:50:29,280 --> 00:50:32,640
one is in RDI, parameter
two is in RSI, and

974
00:50:32,640 --> 00:50:35,580
parameter three is in RDX.

975
00:50:35,580 --> 00:50:37,170
So I don't expect you
to remember that off

976
00:50:37,170 --> 00:50:37,920
the top your head.

977
00:50:37,920 --> 00:50:41,020
But we have the cheat sheet, and
you can figure that out.

978
00:50:41,020 --> 00:50:43,520
So here's what it does is, oh
my goodness, what is that

979
00:50:43,520 --> 00:50:44,530
instruction?

980
00:50:44,530 --> 00:50:51,070
This is actually a computation
of effective address.

981
00:50:51,070 --> 00:50:54,100
So the effective address is
basically saying, and it's

982
00:50:54,100 --> 00:50:56,610
using one of these funny
indexing modes.

983
00:50:56,610 --> 00:51:00,650
So what this is actually doing
is it's actually adding these

984
00:51:00,650 --> 00:51:05,690
two numbers together, the
values stored in those

985
00:51:05,690 --> 00:51:08,940
locations together, and
storing it into RAX.

986
00:51:08,940 --> 00:51:16,990
And then it's then OR-ing RDX,
what's in RDX, with RAX and

987
00:51:16,990 --> 00:51:17,800
then returning.

988
00:51:17,800 --> 00:51:22,280
Remember that RAX is where the
result is always going to be.

989
00:51:22,280 --> 00:51:24,470
So the result is always
returned in RAX.

990
00:51:24,470 --> 00:51:27,360
So you can see you have to do a
little bit more complicated

991
00:51:27,360 --> 00:51:30,240
addressing in order to pull them
out as parameters than if

992
00:51:30,240 --> 00:51:32,690
it could actually figure out
what the numbers are.

993
00:51:32,690 --> 00:51:35,890
Last example here is I declared
these things before I

994
00:51:35,890 --> 00:51:38,760
ever got their globals.

995
00:51:38,760 --> 00:51:40,540
And so I declared them
before I ever got in.

996
00:51:40,540 --> 00:51:42,530
So that means since they're
globals, they have a fixed

997
00:51:42,530 --> 00:51:44,200
place in memory.

998
00:51:44,200 --> 00:51:49,180
And so the code that's generated
is moving, it turns

999
00:51:49,180 --> 00:51:53,790
out it allocates them right
nearby the instructions here.

1000
00:51:53,790 --> 00:51:56,710
And so what it does is it has
actually a relative offset for

1001
00:51:56,710 --> 00:52:03,490
x, relativity instruction
pointer, put that in RAX, add

1002
00:52:03,490 --> 00:52:08,420
the offset of x into it, and
then OR it with the offset of

1003
00:52:08,420 --> 00:52:10,990
z, and then return.

1004
00:52:10,990 --> 00:52:13,960
And so there the constants are
actually stored right nearby

1005
00:52:13,960 --> 00:52:19,200
in the code so that they can
use this relative offset.

1006
00:52:19,200 --> 00:52:23,470
And the compiler figures out, or
the assembler figures out,

1007
00:52:23,470 --> 00:52:26,640
exactly what the offset is
that it actually needs to

1008
00:52:26,640 --> 00:52:30,410
substitute for y so that it can
be a relative offset from

1009
00:52:30,410 --> 00:52:32,120
the current instruction
pointer.

1010
00:52:32,120 --> 00:52:33,970
Notice that, for example,
that's going to change

1011
00:52:33,970 --> 00:52:35,980
depending upon the
value of y here.

1012
00:52:35,980 --> 00:52:39,700
It's going to change compared to
if I accessed y down here.

1013
00:52:39,700 --> 00:52:40,960
It would be a different
instruction

1014
00:52:40,960 --> 00:52:43,070
pointer at this point.

1015
00:52:43,070 --> 00:52:45,020
So it actually just goes but
it computes what the

1016
00:52:45,020 --> 00:52:50,710
difference is so it knows
what the distance is.

1017
00:52:50,710 --> 00:52:53,730
It can compute that at compile
time, and then at execution

1018
00:52:53,730 --> 00:52:57,440
time it just uses whatever
constant goes in there.

1019
00:52:57,440 --> 00:52:59,540
So the important thing here is
just to notice that the code

1020
00:52:59,540 --> 00:53:01,990
depends upon where x, y,
and z are allocated.

1021
00:53:01,990 --> 00:53:06,010

1022
00:53:06,010 --> 00:53:11,420
So the first thing to actually
look at good code is to

1023
00:53:11,420 --> 00:53:13,400
understand the calling
convention

1024
00:53:13,400 --> 00:53:15,915
that's used by the compiler.

1025
00:53:15,915 --> 00:53:19,950

1026
00:53:19,950 --> 00:53:21,550
And here are the basics of it.

1027
00:53:21,550 --> 00:53:24,910
So the register RSP points
to the function

1028
00:53:24,910 --> 00:53:26,620
call stack in memory.

1029
00:53:26,620 --> 00:53:29,750
And the call stack grows
downward in memory, like in

1030
00:53:29,750 --> 00:53:33,080
that little map I showed you
before, so that as you push

1031
00:53:33,080 --> 00:53:36,040
things onto the stack they're
getting lower numbered not

1032
00:53:36,040 --> 00:53:37,290
higher numbered.

1033
00:53:37,290 --> 00:53:40,260

1034
00:53:40,260 --> 00:53:43,120
The call instruction pushes
the current instruction

1035
00:53:43,120 --> 00:53:47,780
pointer onto the stack, jumps
to the call target operand,

1036
00:53:47,780 --> 00:53:51,550
which is basically the address
of the thing you're calling.

1037
00:53:51,550 --> 00:53:55,020
So when you do a call,
it saves your return

1038
00:53:55,020 --> 00:53:57,580
address on the stack.

1039
00:53:57,580 --> 00:54:00,520
The return instruction pops
the return address off the

1040
00:54:00,520 --> 00:54:02,410
stack and returns
to the caller.

1041
00:54:02,410 --> 00:54:05,520
It basically says, oh, I know
where the return address is.

1042
00:54:05,520 --> 00:54:09,790
I slam that into the current
instruction pointer, and that

1043
00:54:09,790 --> 00:54:13,140
becomes the next instruction
that's executed.

1044
00:54:13,140 --> 00:54:16,080
Now, there are some software
conventions that are used

1045
00:54:16,080 --> 00:54:17,210
that's helpful to know.

1046
00:54:17,210 --> 00:54:20,030
Besides those instruction
registers, some of the

1047
00:54:20,030 --> 00:54:23,300
registers are expected to be
saved by the caller, some are

1048
00:54:23,300 --> 00:54:26,930
expected to be saved
by the callee.

1049
00:54:26,930 --> 00:54:31,460
You're free to violate this in
your own little piece of code

1050
00:54:31,460 --> 00:54:34,690
as long as if you're calling
something else,

1051
00:54:34,690 --> 00:54:37,260
you're obeying it.

1052
00:54:37,260 --> 00:54:40,090
So you don't have obey this
convention in the code you

1053
00:54:40,090 --> 00:54:44,760
write unless you want to
interoperate with other stuff.

1054
00:54:44,760 --> 00:54:48,320
So if you, for example, have a
leaf procedure, you can decide

1055
00:54:48,320 --> 00:54:50,880
for that leaf procedure, oh,
I'm going to make something

1056
00:54:50,880 --> 00:54:54,240
callee saved that was caller
saved or whatever as long as

1057
00:54:54,240 --> 00:54:56,840
by the time you return you've
cleaned everything up for the

1058
00:54:56,840 --> 00:54:58,450
rest of the world.

1059
00:54:58,450 --> 00:55:00,800
So these are conventions.

1060
00:55:00,800 --> 00:55:03,575
But for the most part, you're
not going to violate these,

1061
00:55:03,575 --> 00:55:06,350
and the code that the compiler
generates doesn't violate

1062
00:55:06,350 --> 00:55:09,460
these because it expects
everything to interoperate.

1063
00:55:09,460 --> 00:55:11,980
So here's how the subroutine
linkage works.

1064
00:55:11,980 --> 00:55:15,100
We're going to do an example
here where function A calls

1065
00:55:15,100 --> 00:55:18,200
function B which will call
function C. And right now,

1066
00:55:18,200 --> 00:55:21,450
we're at the point we're
executing B. And so on the

1067
00:55:21,450 --> 00:55:29,600
stack are the arguments that
were passed from A to B that

1068
00:55:29,600 --> 00:55:32,020
did not fit within
the registers.

1069
00:55:32,020 --> 00:55:33,050
So normally most of the

1070
00:55:33,050 --> 00:55:34,510
arguments are within registers.

1071
00:55:34,510 --> 00:55:38,750
But if you exceed the six
registers then, because you

1072
00:55:38,750 --> 00:55:40,430
have a long argument
list, then it gets

1073
00:55:40,430 --> 00:55:41,420
passed on the stack.

1074
00:55:41,420 --> 00:55:43,020
And here's where
it gets passed.

1075
00:55:43,020 --> 00:55:45,890
The next thing is B's
return address.

1076
00:55:45,890 --> 00:55:48,200
This is the thing that
got smashed in there

1077
00:55:48,200 --> 00:55:49,890
when you did the call.

1078
00:55:49,890 --> 00:55:51,620
It got pushed onto the stack.

1079
00:55:51,620 --> 00:55:54,290
And then there's what's called a
base pointer for A. And this

1080
00:55:54,290 --> 00:55:57,790
is the way that A ends up
accessing its local variables.

1081
00:55:57,790 --> 00:56:01,110
And then there's a separate
region here where it's going

1082
00:56:01,110 --> 00:56:06,920
to put arguments from B to B's
callees if they exceed the six

1083
00:56:06,920 --> 00:56:13,440
registers, if any of the things
that B is calling

1084
00:56:13,440 --> 00:56:15,130
require more than the
six registers.

1085
00:56:15,130 --> 00:56:16,850
So let's just take a look.

1086
00:56:16,850 --> 00:56:22,320
So function B can access its
nonregister values by indexing

1087
00:56:22,320 --> 00:56:25,600
off of RBP.

1088
00:56:25,600 --> 00:56:28,480
So these we say, these are
in a linkage block.

1089
00:56:28,480 --> 00:56:30,250
And the reason is because
it's actually part of

1090
00:56:30,250 --> 00:56:31,850
A's frame as well.

1091
00:56:31,850 --> 00:56:35,380
It's a shared part of the frame
where A stores it into

1092
00:56:35,380 --> 00:56:38,010
memory and then B's going to
fetch it out of memory.

1093
00:56:38,010 --> 00:56:39,870
And that's the linkage block.

1094
00:56:39,870 --> 00:56:41,900
So this is positive in memory.

1095
00:56:41,900 --> 00:56:47,500
So if I use a positive offset,
I then go up to getting the

1096
00:56:47,500 --> 00:56:55,820
arguments from A. Then it can
access its local variables

1097
00:56:55,820 --> 00:56:58,510
from the base point with a
negative offset because we're

1098
00:56:58,510 --> 00:56:59,760
growing down in memory.

1099
00:56:59,760 --> 00:57:02,790

1100
00:57:02,790 --> 00:57:08,640
Now, if it wants to call C, what
it does is it places the

1101
00:57:08,640 --> 00:57:15,680
nonregister arguments into the
reserved linkage block here,

1102
00:57:15,680 --> 00:57:18,730
which are arguments from
B to B's callees.

1103
00:57:18,730 --> 00:57:21,890
And that once again acts just as
if they're local variables.

1104
00:57:21,890 --> 00:57:28,530
It's positive index off of RBP,
sorry, negative offset

1105
00:57:28,530 --> 00:57:29,580
off of RBP.

1106
00:57:29,580 --> 00:57:33,860
So it pushes those things into
the argument, into that

1107
00:57:33,860 --> 00:57:38,080
region, if it needs to
use that region.

1108
00:57:38,080 --> 00:57:42,130
Then we actually, once it's done
that, we have the call.

1109
00:57:42,130 --> 00:57:46,920
So B calls C which saves the
return address for B on the

1110
00:57:46,920 --> 00:57:52,440
stack, so it saves it on the
stack, and then transfers

1111
00:57:52,440 --> 00:58:00,120
control to C. So now it starts
executing C's code.

1112
00:58:00,120 --> 00:58:01,780
And what does C do?

1113
00:58:01,780 --> 00:58:05,120
So C is going to have to advance
these pointers to

1114
00:58:05,120 --> 00:58:08,720
refer to its region
rather than B's.

1115
00:58:08,720 --> 00:58:12,770
It does it by saving B's base
pointer on the stack.

1116
00:58:12,770 --> 00:58:15,695
So it saves this pointer here so
that it can restore it when

1117
00:58:15,695 --> 00:58:16,950
it returns.

1118
00:58:16,950 --> 00:58:21,010
It advances, it sets its new
base pointer to be where the

1119
00:58:21,010 --> 00:58:25,850
stack pointer is now and then
advances the stack pointer to

1120
00:58:25,850 --> 00:58:29,520
allocate space for C's local
variables and linkage blocks.

1121
00:58:29,520 --> 00:58:30,770
Watch, here we go.

1122
00:58:30,770 --> 00:58:36,690

1123
00:58:36,690 --> 00:58:38,900
So that ends up being
C's frame.

1124
00:58:38,900 --> 00:58:43,360
So notice that B's frame and C's
frame are overlapping in

1125
00:58:43,360 --> 00:58:45,157
the linkage block
between them.

1126
00:58:45,157 --> 00:58:49,220

1127
00:58:49,220 --> 00:58:52,800
Now, if a function never
performs stack allocations

1128
00:58:52,800 --> 00:58:57,090
except during function calls,
there's a great compile time

1129
00:58:57,090 --> 00:59:00,810
optimization that the compiler
will often do.

1130
00:59:00,810 --> 00:59:03,350
And what it will do
is realize that

1131
00:59:03,350 --> 00:59:06,620
this distance is constant.

1132
00:59:06,620 --> 00:59:10,450
So, therefore, it doesn't
need RBP.

1133
00:59:10,450 --> 00:59:15,190
It can just do the math and
index everything off of RSP as

1134
00:59:15,190 --> 00:59:19,440
long as RSP is always the same,
for example for C, when

1135
00:59:19,440 --> 00:59:21,840
C is executing.

1136
00:59:21,840 --> 00:59:24,400
There's certain C commands
like [? alaka ?]

1137
00:59:24,400 --> 00:59:26,310
which changed the
stack pointer.

1138
00:59:26,310 --> 00:59:31,040
If you use those, the compiler
can't do that optimization.

1139
00:59:31,040 --> 00:59:36,200
But if the storage on the stack
never changes for a

1140
00:59:36,200 --> 00:59:39,490
given frame, then it's free
to make this optimization.

1141
00:59:39,490 --> 00:59:42,360
So you'll see code where RBP
has been optimized away.

1142
00:59:42,360 --> 00:59:45,590

1143
00:59:45,590 --> 00:59:48,340
How about some questions
before we go

1144
00:59:48,340 --> 00:59:50,030
on and do an example.

1145
00:59:50,030 --> 00:59:51,443
Yeah?

1146
00:59:51,443 --> 00:59:52,856
AUDIENCE: [INAUDIBLE]

1147
00:59:52,856 --> 00:59:56,624
should there be A's return
address, where you try to

1148
00:59:56,624 --> 00:59:59,094
[INAUDIBLE]?

1149
00:59:59,094 --> 01:00:00,390
CHARLES LEISERSON: Up?

1150
01:00:00,390 --> 01:00:02,360
Oh, this should be,
sorry, this should

1151
01:00:02,360 --> 01:00:03,310
be A's return address.

1152
01:00:03,310 --> 01:00:04,420
Yes, you're right.

1153
01:00:04,420 --> 01:00:05,670
OK, good, typo.

1154
01:00:05,670 --> 01:00:09,090

1155
01:00:09,090 --> 01:00:10,956
Is somebody catching
my typos to--

1156
01:00:10,956 --> 01:00:14,080

1157
01:00:14,080 --> 01:00:16,640
OK, yep, a good one, that should
be A's return address.

1158
01:00:16,640 --> 01:00:18,410
Sorry about that.

1159
01:00:18,410 --> 01:00:20,560
This is B's return address.

1160
01:00:20,560 --> 01:00:21,460
Any other questions?

1161
01:00:21,460 --> 01:00:22,070
That's good.

1162
01:00:22,070 --> 01:00:23,720
That means you understand
something.

1163
01:00:23,720 --> 01:00:24,970
Hooray.

1164
01:00:24,970 --> 01:00:29,980

1165
01:00:29,980 --> 01:00:31,750
So let's do an example.

1166
01:00:31,750 --> 01:00:34,450
So here's my fib example.

1167
01:00:34,450 --> 01:00:38,150
And I compiled this with
minus oh zero.

1168
01:00:38,150 --> 01:00:41,620
Because when I compiled it
with minus oh three, I

1169
01:00:41,620 --> 01:00:44,650
couldn't understand
what was going on.

1170
01:00:44,650 --> 01:00:48,000
So I compiled this with minus oh
zero, which gives me really

1171
01:00:48,000 --> 01:00:48,910
unoptimized code.

1172
01:00:48,910 --> 01:00:52,150
And that lets me be the
compiler optimizer.

1173
01:00:52,150 --> 01:00:54,920
So here's the code that
it generates.

1174
01:00:54,920 --> 01:00:56,720
So we can take a look at
a few things here.

1175
01:00:56,720 --> 01:00:59,370
First of all is declaring
fib to be a global.

1176
01:00:59,370 --> 01:01:00,880
And it's got some other
things here.

1177
01:01:00,880 --> 01:01:03,640
I actually took out some of the
directives that were in

1178
01:01:03,640 --> 01:01:05,590
here that were irrelevant
for our purposes.

1179
01:01:05,590 --> 01:01:07,960
If you actually compile it,
there's a lot more directives

1180
01:01:07,960 --> 01:01:10,680
that are stuck in there and a
lot more labels and things

1181
01:01:10,680 --> 01:01:16,700
that you don't need
to understand it.

1182
01:01:16,700 --> 01:01:18,340
There are two labels here.

1183
01:01:18,340 --> 01:01:24,850
And so you can see here
basically what's going on is

1184
01:01:24,850 --> 01:01:29,090
we're first of all doing the
advancing of the base pointer

1185
01:01:29,090 --> 01:01:31,530
and advancing the stack
pointer here.

1186
01:01:31,530 --> 01:01:34,380
That's doing that operation that
I showed you, those of

1187
01:01:34,380 --> 01:01:37,680
moving the base and
stack pointer up.

1188
01:01:37,680 --> 01:01:45,170
And then at the end here this
is equivalent to doing, a

1189
01:01:45,170 --> 01:01:47,750
leave instruction is equivalent
to undoing that.

1190
01:01:47,750 --> 01:01:52,110
So Intel lets you do one leave
instruction rather than making

1191
01:01:52,110 --> 01:01:55,580
you put these instructions
in every time.

1192
01:01:55,580 --> 01:01:59,230
It's exactly the same thing.

1193
01:01:59,230 --> 01:02:01,590
But in any case, let's
just sort of see

1194
01:02:01,590 --> 01:02:02,940
what's going on here.

1195
01:02:02,940 --> 01:02:07,370
So we're pushing some storage.

1196
01:02:07,370 --> 01:02:10,080
This is saving a
register here.

1197
01:02:10,080 --> 01:02:14,750
We're then advancing the stack
pointer to store 24 bytes of

1198
01:02:14,750 --> 01:02:16,580
temporary storage.

1199
01:02:16,580 --> 01:02:19,950
And then we're start to do
some computations here.

1200
01:02:19,950 --> 01:02:22,580
This looks like we're comparing
one with something

1201
01:02:22,580 --> 01:02:24,880
and then doing a ja.

1202
01:02:24,880 --> 01:02:27,510
So this is a jump above.

1203
01:02:27,510 --> 01:02:29,370
This is the unsigned version.

1204
01:02:29,370 --> 01:02:32,380
What you're looking is to see,
here we say if n is less than

1205
01:02:32,380 --> 01:02:36,450
two, in fact, what it's doing is
saying if n is greater than

1206
01:02:36,450 --> 01:02:38,770
one go to L4.

1207
01:02:38,770 --> 01:02:42,280

1208
01:02:42,280 --> 01:02:43,640
So it's actually doing
the other one.

1209
01:02:43,640 --> 01:02:47,020
So you can see then L4 is, what
happens is the one that

1210
01:02:47,020 --> 01:02:50,200
has the two calls to fib,
recursive calls, so that's

1211
01:02:50,200 --> 01:02:52,850
this part of the code, and
it's doing that if it's

1212
01:02:52,850 --> 01:02:53,470
greater than one.

1213
01:02:53,470 --> 01:02:58,310
And otherwise it's going to
execute these instructions,

1214
01:02:58,310 --> 01:03:01,820
which are basically
returning n.

1215
01:03:01,820 --> 01:03:04,020
And so it basically does
some computations.

1216
01:03:04,020 --> 01:03:07,510
And then both of them converge
here where it moves the

1217
01:03:07,510 --> 01:03:11,490
results and then pops
it off and so forth.

1218
01:03:11,490 --> 01:03:13,810
So that's sort of the outline
of what's going on there.

1219
01:03:13,810 --> 01:03:16,920
So let's dive in here a little
bit and sort of see what's

1220
01:03:16,920 --> 01:03:21,820
going on, see if we can read
this a little bit more closely

1221
01:03:21,820 --> 01:03:24,180
and whether we can
optimize it.

1222
01:03:24,180 --> 01:03:26,500
So the first thing that I
noticed in looking at this is

1223
01:03:26,500 --> 01:03:30,330
look at all this memory
addressing that we're doing.

1224
01:03:30,330 --> 01:03:37,070
What do you suppose this thing
is, minus 16% RBP?

1225
01:03:37,070 --> 01:03:38,080
So this is the base pointer.

1226
01:03:38,080 --> 01:03:42,340
So this is a local variable
because it's a negative offset

1227
01:03:42,340 --> 01:03:43,790
off of the base pointer.

1228
01:03:43,790 --> 01:03:45,250
What do you think it's
doing there?

1229
01:03:45,250 --> 01:03:48,720

1230
01:03:48,720 --> 01:03:49,970
What's stored in here?

1231
01:03:49,970 --> 01:03:54,880

1232
01:03:54,880 --> 01:03:58,960
Yeah, this is where
n is being stored.

1233
01:03:58,960 --> 01:03:59,790
Because what are we doing?

1234
01:03:59,790 --> 01:04:04,240
We're trying to compare n with
one here even though it says

1235
01:04:04,240 --> 01:04:05,470
two up there.

1236
01:04:05,470 --> 01:04:08,970
We're comparing it
with one here.

1237
01:04:08,970 --> 01:04:11,460
And so I look at that, and I
say, look, I'm comparing it

1238
01:04:11,460 --> 01:04:19,710
with one, then I'm jumping to
L4, then I jump to L4 or not.

1239
01:04:19,710 --> 01:04:21,310
And then let's say I don't.

1240
01:04:21,310 --> 01:04:25,320
Well, then the first thing I
do is I move n into RAX.

1241
01:04:25,320 --> 01:04:28,530
But wait a minute, I just
compared it with that.

1242
01:04:28,530 --> 01:04:29,230
So I'm accessing n again.

1243
01:04:29,230 --> 01:04:31,100
I'm accessing it a third time.

1244
01:04:31,100 --> 01:04:32,930
How about if I try to
store that stuff

1245
01:04:32,930 --> 01:04:34,180
in a register instead?

1246
01:04:34,180 --> 01:04:36,520

1247
01:04:36,520 --> 01:04:39,830
So what I did is I picked the
RDI register, because that one

1248
01:04:39,830 --> 01:04:44,630
happens to be available, and
I said do they, if you look

1249
01:04:44,630 --> 01:04:45,320
here, what did we do?

1250
01:04:45,320 --> 01:04:51,350
We stored RDI, which is the
first argument, into memory.

1251
01:04:51,350 --> 01:04:53,270
And then we compared
with it in memory.

1252
01:04:53,270 --> 01:04:54,640
Why don't we compare
with it in RDI?

1253
01:04:54,640 --> 01:04:57,230

1254
01:04:57,230 --> 01:04:57,520
Right?

1255
01:04:57,520 --> 01:05:02,655
Duh, stupid compiler, well,
because I had minus oh zero.

1256
01:05:02,655 --> 01:05:05,610

1257
01:05:05,610 --> 01:05:07,590
OK, so I can do that
improvement.

1258
01:05:07,590 --> 01:05:11,140
So what I did was I edited it to
put RDI there and RDI here

1259
01:05:11,140 --> 01:05:12,410
and RDI here.

1260
01:05:12,410 --> 01:05:15,990
And I went up and I said,
what about RDI here?

1261
01:05:15,990 --> 01:05:17,410
Why didn't I replace that one?

1262
01:05:17,410 --> 01:05:27,130

1263
01:05:27,130 --> 01:05:28,650
No, there's no loop
going on here.

1264
01:05:28,650 --> 01:05:29,694
It's recursion.

1265
01:05:29,694 --> 01:05:30,944
AUDIENCE: [INAUDIBLE PHRASE]

1266
01:05:30,944 --> 01:05:33,870

1267
01:05:33,870 --> 01:05:35,690
CHARLES LEISERSON: Yeah, the
problem is that when I call

1268
01:05:35,690 --> 01:05:41,020
fib, RDI gets garbaged on me.

1269
01:05:41,020 --> 01:05:45,590
Because RDI is going to be
the first argument to--

1270
01:05:45,590 --> 01:05:48,330
See it's being garbaged here?

1271
01:05:48,330 --> 01:05:51,360
It's garbage as far as
my use of it for n.

1272
01:05:51,360 --> 01:05:55,790
It's being used to pass n minus
1 as the argument to the

1273
01:05:55,790 --> 01:05:57,040
recursive call.

1274
01:05:57,040 --> 01:06:02,670

1275
01:06:02,670 --> 01:06:05,500
So I can't replace this one
after fib because RDI no

1276
01:06:05,500 --> 01:06:06,420
longer has it.

1277
01:06:06,420 --> 01:06:07,810
Because I had to leave it.

1278
01:06:07,810 --> 01:06:12,350
But even so I went from 5.45
seconds for the original code

1279
01:06:12,350 --> 01:06:14,930
to 4.09 seconds when
I compile that just

1280
01:06:14,930 --> 01:06:16,300
with that little change.

1281
01:06:16,300 --> 01:06:19,360
I felt pretty good.

1282
01:06:19,360 --> 01:06:20,200
I felt pretty good.

1283
01:06:20,200 --> 01:06:23,090
So then I wanted more.

1284
01:06:23,090 --> 01:06:23,710
That was fun.

1285
01:06:23,710 --> 01:06:24,960
I wanted more.

1286
01:06:24,960 --> 01:06:27,570

1287
01:06:27,570 --> 01:06:30,660
So what was the next
thing I noticed?

1288
01:06:30,660 --> 01:06:31,720
I noticed that--

1289
01:06:31,720 --> 01:06:34,260
And by the way almost
all the things, that

1290
01:06:34,260 --> 01:06:35,620
stuff I did last night.

1291
01:06:35,620 --> 01:06:37,870
This is what I did an hour
before class so we'll see

1292
01:06:37,870 --> 01:06:39,960
whether it--

1293
01:06:39,960 --> 01:06:44,260
So then I noticed that, look,
we're moving this stuff here.

1294
01:06:44,260 --> 01:06:46,420
We keep using minus 24.

1295
01:06:46,420 --> 01:06:48,870
And once again, memory
operations are expensive

1296
01:06:48,870 --> 01:06:50,290
compared to register
operations.

1297
01:06:50,290 --> 01:06:51,340
Let me try to get rid them.

1298
01:06:51,340 --> 01:06:52,850
What do you suppose
is in here?

1299
01:06:52,850 --> 01:06:59,160

1300
01:06:59,160 --> 01:07:08,990
So look, we move RAX into the
local variable minus 24.

1301
01:07:08,990 --> 01:07:11,310
And then we jump to L5.

1302
01:07:11,310 --> 01:07:13,370
And we move minus 24 into RAX.

1303
01:07:13,370 --> 01:07:16,950

1304
01:07:16,950 --> 01:07:19,490
That seems kind of
unnecessary.

1305
01:07:19,490 --> 01:07:22,470
Here we move RBX
into minus 24.

1306
01:07:22,470 --> 01:07:26,490
Then we move minus
24 into RAX.

1307
01:07:26,490 --> 01:07:29,350
What is this value first of all
that I'm storing there?

1308
01:07:29,350 --> 01:07:35,840

1309
01:07:35,840 --> 01:07:39,050
What's going to be in
RAX at the very end?

1310
01:07:39,050 --> 01:07:42,730

1311
01:07:42,730 --> 01:07:46,620
RAX is the return value.

1312
01:07:46,620 --> 01:07:50,010
So I'm trying to save, here in
this case, this is the branch

1313
01:07:50,010 --> 01:07:52,010
where I just want to return n.

1314
01:07:52,010 --> 01:07:55,185
I just want to put RAX
to have it return, be

1315
01:07:55,185 --> 01:07:57,720
in RAX when I return.

1316
01:07:57,720 --> 01:07:58,910
So I've got the value here.

1317
01:07:58,910 --> 01:07:59,440
It's just n.

1318
01:07:59,440 --> 01:08:00,395
It was in RDI.

1319
01:08:00,395 --> 01:08:01,750
It's now in RAX.

1320
01:08:01,750 --> 01:08:04,240
But that's clearly
unnecessary.

1321
01:08:04,240 --> 01:08:07,940
Why go put it into memory and
then take it back out again?

1322
01:08:07,940 --> 01:08:11,140
And here just put it
in RAX directly.

1323
01:08:11,140 --> 01:08:12,760
So that's what I did.

1324
01:08:12,760 --> 01:08:17,180
I basically, instead of moving
it here, I changed this

1325
01:08:17,180 --> 01:08:20,180
instruction that said add it
and put in RBX, I said, no,

1326
01:08:20,180 --> 01:08:21,810
don't put it in RBX.

1327
01:08:21,810 --> 01:08:27,200
Let's just add RBX into RAX,
and then it's right there.

1328
01:08:27,200 --> 01:08:30,800
And this one, get rid of those
so that it's now moved into

1329
01:08:30,800 --> 01:08:33,840
RAX and it's in RAX.

1330
01:08:33,840 --> 01:08:35,000
So I did that.

1331
01:08:35,000 --> 01:08:39,319
I dropped to 3.9 seconds.

1332
01:08:39,319 --> 01:08:40,630
That felt pretty good, too.

1333
01:08:40,630 --> 01:08:45,229
In addition, I got rid of
this extra variable.

1334
01:08:45,229 --> 01:08:49,600
So now I could actually reduce
my storage requirements.

1335
01:08:49,600 --> 01:08:53,000
However, when I measured it
with this being 24 and not

1336
01:08:53,000 --> 01:08:55,970
being 24, it was
the same speed.

1337
01:08:55,970 --> 01:08:58,870
So it's like, eh, but
I didn't want to

1338
01:08:58,870 --> 01:09:02,260
waste the storage anyway.

1339
01:09:02,260 --> 01:09:05,200
So then I looked a little
bit further.

1340
01:09:05,200 --> 01:09:10,660
And I noticed that I
want to get rid of

1341
01:09:10,660 --> 01:09:13,750
this access to n here.

1342
01:09:13,750 --> 01:09:17,450
So basically I'm subtracting
it, and I'm storing n.

1343
01:09:17,450 --> 01:09:18,569
How can I get rid of it?

1344
01:09:18,569 --> 01:09:21,020
And this took me a little
while to figure out.

1345
01:09:21,020 --> 01:09:23,090
What I realized is, look,
we're storing

1346
01:09:23,090 --> 01:09:27,399
stuff away in RBX.

1347
01:09:27,399 --> 01:09:31,770
We have RBX as an available
register because I saved the

1348
01:09:31,770 --> 01:09:34,680
value of RBX with this push
instruction there.

1349
01:09:34,680 --> 01:09:36,359
So RBX is an available
register.

1350
01:09:36,359 --> 01:09:46,120
We're using it to keep the
return value of the

1351
01:09:46,120 --> 01:09:47,370
first call to fib.

1352
01:09:47,370 --> 01:09:49,510

1353
01:09:49,510 --> 01:09:52,540
So I'm going to use it for their
first call to fib so

1354
01:09:52,540 --> 01:09:55,000
that when I make the second call
to fib I can then add the

1355
01:09:55,000 --> 01:09:56,250
two things together.

1356
01:09:56,250 --> 01:09:59,360

1357
01:09:59,360 --> 01:10:04,030
Well, how about if before the
first call the fib, why don't

1358
01:10:04,030 --> 01:10:08,150
I use it to store the value of n
and then use it to store the

1359
01:10:08,150 --> 01:10:15,810
value of the return value
of fib of n minus 1?

1360
01:10:15,810 --> 01:10:17,590
So I did that.

1361
01:10:17,590 --> 01:10:22,640
And that took a little
bit of moving things

1362
01:10:22,640 --> 01:10:23,550
around a little bit.

1363
01:10:23,550 --> 01:10:26,380
But I managed to get rid of it
by using RBX for two different

1364
01:10:26,380 --> 01:10:31,240
purposes, one to store the
temporary, the value of n, and

1365
01:10:31,240 --> 01:10:39,010
the other to store the return
value when I need it.

1366
01:10:39,010 --> 01:10:40,300
And when I did that, I
got it all the way

1367
01:10:40,300 --> 01:10:44,640
down to 3.61 seconds.

1368
01:10:44,640 --> 01:10:47,380
I actually ran it with
minus oh three,

1369
01:10:47,380 --> 01:10:49,650
took about two seconds.

1370
01:10:49,650 --> 01:10:51,720
So I think I can keep
my day job.

1371
01:10:51,720 --> 01:10:54,280

1372
01:10:54,280 --> 01:10:57,070
But kind of fun to go in and
sort of see what are the

1373
01:10:57,070 --> 01:10:58,500
things that can be done.

1374
01:10:58,500 --> 01:11:00,520
And you can get a very good
sense of what's going on.

1375
01:11:00,520 --> 01:11:03,000
The more important thing is
when you look at compilers

1376
01:11:03,000 --> 01:11:05,280
generating your code, as we
saw on the last lecture on

1377
01:11:05,280 --> 01:11:10,480
profiling, you can see, oh, it
did something silly here.

1378
01:11:10,480 --> 01:11:12,000
So you can actually go
and say, oh, it's

1379
01:11:12,000 --> 01:11:13,130
doing something silly.

1380
01:11:13,130 --> 01:11:14,990
We can do a better
job than that.

1381
01:11:14,990 --> 01:11:17,700
Or, oh, I didn't realize I'd
declared this an int when in

1382
01:11:17,700 --> 01:11:23,100
fact, if I declared it a
unsigned int 64, it actually

1383
01:11:23,100 --> 01:11:25,260
would produce faster,
better code.

1384
01:11:25,260 --> 01:11:25,480
Yeah?

1385
01:11:25,480 --> 01:11:26,868
Question?

1386
01:11:26,868 --> 01:11:27,362
AUDIENCE: Sorry.

1387
01:11:27,362 --> 01:11:32,302
So when you said that when you
run it with minus oh three,

1388
01:11:32,302 --> 01:11:34,772
what you're just saying is even
though you optimized the

1389
01:11:34,772 --> 01:11:35,266
[INAUDIBLE]?

1390
01:11:35,266 --> 01:11:38,420
CHARLES LEISERSON: As the
compiler, that's why I said I

1391
01:11:38,420 --> 01:11:39,670
can keep my day job.

1392
01:11:39,670 --> 01:11:43,060

1393
01:11:43,060 --> 01:11:46,650
So simple optimization
strategies, if you're playing

1394
01:11:46,650 --> 01:11:49,090
with things, is you try to keep
values and registers to

1395
01:11:49,090 --> 01:11:51,320
eliminate excess
memory traffic.

1396
01:11:51,320 --> 01:11:55,550
You can optimize naive function
call linkage.

1397
01:11:55,550 --> 01:11:58,790
And the most important thing
probably is constant fold.

1398
01:11:58,790 --> 01:12:03,690
Look to see where you've
got constants that

1399
01:12:03,690 --> 01:12:05,060
can be combined together.

1400
01:12:05,060 --> 01:12:07,510
There are other optimizations
that compilers do like common

1401
01:12:07,510 --> 01:12:10,300
subexpression elimination
and so forth.

1402
01:12:10,300 --> 01:12:12,110
But these are sort of the ones,
if you're doing it by

1403
01:12:12,110 --> 01:12:14,870
hand, these are sort of things
to focus on, particularly

1404
01:12:14,870 --> 01:12:20,690
number one, just get rid of
excess memory traffic.

1405
01:12:20,690 --> 01:12:23,120
Let me say, by the way, in doing
this I also went down a

1406
01:12:23,120 --> 01:12:25,670
bunch of dead ends, things that
I said, oh, this should

1407
01:12:25,670 --> 01:12:28,840
definitely save, and
then it was slower.

1408
01:12:28,840 --> 01:12:31,700
And then we look at it, and it
turns out, oh, my branch

1409
01:12:31,700 --> 01:12:34,410
misprediction rate is going
way up and so forth.

1410
01:12:34,410 --> 01:12:35,540
That's why you have
a profiler.

1411
01:12:35,540 --> 01:12:36,830
Because you don't want
to do this blind.

1412
01:12:36,830 --> 01:12:39,780

1413
01:12:39,780 --> 01:12:44,860
Now, how does the compiler
compile some common high level

1414
01:12:44,860 --> 01:12:46,110
structures?

1415
01:12:46,110 --> 01:12:48,130

1416
01:12:48,130 --> 01:12:52,590
So if you have a conditional,
for example, if p, do the

1417
01:12:52,590 --> 01:12:56,360
ctrue clause, else do the cfalse
clause, what it does

1418
01:12:56,360 --> 01:12:59,910
basically is it generates
instructions to evaluate p.

1419
01:12:59,910 --> 01:13:02,570
And then it does a jump with the
condition to see if p is

1420
01:13:02,570 --> 01:13:06,650
false to the else clause and
executes those instruction.

1421
01:13:06,650 --> 01:13:11,730
And then otherwise it passes
through, does the true clause,

1422
01:13:11,730 --> 01:13:12,930
and then jumps to the end.

1423
01:13:12,930 --> 01:13:16,800
And you'll see that pattern in
the code when you look at it.

1424
01:13:16,800 --> 01:13:18,550
So that's a very common
pattern for doing

1425
01:13:18,550 --> 01:13:21,370
conditionals.

1426
01:13:21,370 --> 01:13:24,660
Compiling while loops is kind
of interesting because most

1427
01:13:24,660 --> 01:13:27,520
while loops start
out with a jump.

1428
01:13:27,520 --> 01:13:29,440
So here are the instructions
for the body

1429
01:13:29,440 --> 01:13:30,810
of the while loop.

1430
01:13:30,810 --> 01:13:31,850
And here's the test.

1431
01:13:31,850 --> 01:13:35,280
And what they usually do is they
jump to the test, they

1432
01:13:35,280 --> 01:13:37,650
evaluate the condition,
and if it's true,

1433
01:13:37,650 --> 01:13:38,460
they jump to the loop.

1434
01:13:38,460 --> 01:13:40,470
Otherwise, they fall through.

1435
01:13:40,470 --> 01:13:43,740
And then they go back, do the
loop sometimes for the first

1436
01:13:43,740 --> 01:13:44,460
time, et cetera.

1437
01:13:44,460 --> 01:13:48,700
So that's kind of the pattern
for a while loop.

1438
01:13:48,700 --> 01:13:51,900
For a for loop, they basically
just convert it

1439
01:13:51,900 --> 01:13:53,610
into a while loop.

1440
01:13:53,610 --> 01:13:56,530
You basically take the
initialization code.

1441
01:13:56,530 --> 01:13:58,030
You execute that.

1442
01:13:58,030 --> 01:14:00,780
Then while the condition is
true, you do the code followed

1443
01:14:00,780 --> 01:14:03,150
by whatever the next code is.

1444
01:14:03,150 --> 01:14:08,510
And so it ends up converting
for loops into while loops.

1445
01:14:08,510 --> 01:14:13,230
Now, arrays are, how do we go
about implementing data types?

1446
01:14:13,230 --> 01:14:18,670
Arrays are just blocks
of memory.

1447
01:14:18,670 --> 01:14:23,440
So you can have basically three
different types of array

1448
01:14:23,440 --> 01:14:26,050
depending upon where it gets
allocated, either allocated in

1449
01:14:26,050 --> 01:14:28,830
the data segment, allocate
on the heap, or

1450
01:14:28,830 --> 01:14:30,080
allocated on the stack.

1451
01:14:30,080 --> 01:14:32,830

1452
01:14:32,830 --> 01:14:35,090
Sometimes even the static
arrays these days can be

1453
01:14:35,090 --> 01:14:37,600
allocated in the code segment,
if you're not

1454
01:14:37,600 --> 01:14:40,010
going to change them.

1455
01:14:40,010 --> 01:14:47,600
So one thing is to understand
that arrays and pointers are

1456
01:14:47,600 --> 01:14:50,580
almost the same thing.

1457
01:14:50,580 --> 01:14:53,820
If you have an array, that's a
pointer to a place in memory

1458
01:14:53,820 --> 01:14:56,360
where the array begins.

1459
01:14:56,360 --> 01:15:00,260
And a zero is the same as the
value you get when you

1460
01:15:00,260 --> 01:15:04,210
dereference the pointer to a.

1461
01:15:04,210 --> 01:15:08,300
A pointer, if you think about
it, is actually just an index

1462
01:15:08,300 --> 01:15:10,970
into the array of all memory.

1463
01:15:10,970 --> 01:15:13,920
And the hardware allows
you to index into the

1464
01:15:13,920 --> 01:15:14,890
array of all memory.

1465
01:15:14,890 --> 01:15:18,530
Well, it can also allow you to
index into any subregion of

1466
01:15:18,530 --> 01:15:19,570
that memory.

1467
01:15:19,570 --> 01:15:21,740
And that's why arrays and
pointers are basically the

1468
01:15:21,740 --> 01:15:22,990
same thing.

1469
01:15:22,990 --> 01:15:25,200

1470
01:15:25,200 --> 01:15:26,400
Here's a little quiz.

1471
01:15:26,400 --> 01:15:28,900
What is eight of a?

1472
01:15:28,900 --> 01:15:32,820

1473
01:15:32,820 --> 01:15:34,300
AUDIENCE: The a elements?

1474
01:15:34,300 --> 01:15:38,230
CHARLES LEISERSON: Yeah, it's
basically a of eight.

1475
01:15:38,230 --> 01:15:41,720
It's basically a of eight
because the addressing that's

1476
01:15:41,720 --> 01:15:43,540
going on is essentially
the same.

1477
01:15:43,540 --> 01:15:45,600
Even though we prefer
to write it--

1478
01:15:45,600 --> 01:15:48,830
If you start writing code like
this, I guarantee that at some

1479
01:15:48,830 --> 01:15:52,355
companies they'll get very angry
at you even though you

1480
01:15:52,355 --> 01:15:55,910
say, it's the same thing.

1481
01:15:55,910 --> 01:15:59,890
But what's going on is you're
actually taking the base, the

1482
01:15:59,890 --> 01:16:02,570
address of a, you're adding
eight to it, and then

1483
01:16:02,570 --> 01:16:05,130
dereferencing that value.

1484
01:16:05,130 --> 01:16:08,680
And indeed, even in C, they
actually do all the coercions,

1485
01:16:08,680 --> 01:16:11,340
even if eight is a different
type, it actually does the

1486
01:16:11,340 --> 01:16:13,260
coercions properly
so that they are

1487
01:16:13,260 --> 01:16:14,620
actually the same thing.

1488
01:16:14,620 --> 01:16:17,860
Because it does them after
it's converted it into a

1489
01:16:17,860 --> 01:16:22,090
dereference of a plus 8.

1490
01:16:22,090 --> 01:16:26,290
So it's kind of interesting
that it works right even

1491
01:16:26,290 --> 01:16:29,420
though when I looked at that I
say, well, what if it's bytes

1492
01:16:29,420 --> 01:16:32,560
verses words and so forth?

1493
01:16:32,560 --> 01:16:32,750
Yeah?

1494
01:16:32,750 --> 01:16:34,004
Question?

1495
01:16:34,004 --> 01:16:37,270
AUDIENCE: Will there be a
performance difference between

1496
01:16:37,270 --> 01:16:41,450
putting data on those three
different arrays?

1497
01:16:41,450 --> 01:16:43,140
CHARLES LEISERSON: Yes,
there can be.

1498
01:16:43,140 --> 01:16:46,730
In particular, static array,
it knows exactly where the

1499
01:16:46,730 --> 01:16:48,770
base pointer is as a constant.

1500
01:16:48,770 --> 01:16:51,470
Whereas the others it has to
actually figure out where it

1501
01:16:51,470 --> 01:16:53,600
is in the heap you need
a pointer to it

1502
01:16:53,600 --> 01:16:54,220
to dereference it.

1503
01:16:54,220 --> 01:16:57,716
It can't put it right into the
instruction stream itself.

1504
01:16:57,716 --> 01:16:58,966
AUDIENCE: [INAUDIBLE]

1505
01:16:58,966 --> 01:17:01,524

1506
01:17:01,524 --> 01:17:03,966
those would be just constant
[INAUDIBLE], right?

1507
01:17:03,966 --> 01:17:08,746
So I can either put that in
the static array or--

1508
01:17:08,746 --> 01:17:10,940
CHARLES LEISERSON: Yes,
generally it's faster to have

1509
01:17:10,940 --> 01:17:14,430
it in a static array
if you can.

1510
01:17:14,430 --> 01:17:16,740
I want to finish up
here so that--

1511
01:17:16,740 --> 01:17:18,250
We have structs.

1512
01:17:18,250 --> 01:17:21,470
Structs are just blocks
of memory also.

1513
01:17:21,470 --> 01:17:23,190
So you can have a bunch
of things here.

1514
01:17:23,190 --> 01:17:27,850
This is a bad way to declare a
struct because the fields are

1515
01:17:27,850 --> 01:17:29,660
stored next to each other
generally in the

1516
01:17:29,660 --> 01:17:32,240
order you give them.

1517
01:17:32,240 --> 01:17:35,470
So here it says it's x and
then i and double.

1518
01:17:35,470 --> 01:17:37,740
What happens here is you
have to be careful

1519
01:17:37,740 --> 01:17:39,930
about alignment issues.

1520
01:17:39,930 --> 01:17:41,400
So if you do [? char, ?]

1521
01:17:41,400 --> 01:17:44,330
it's then got to pad it
out to get to the next

1522
01:17:44,330 --> 01:17:45,740
alignment for an int.

1523
01:17:45,740 --> 01:17:47,460
And it's got to pad that
out to get the next

1524
01:17:47,460 --> 01:17:48,970
alignment for a double.

1525
01:17:48,970 --> 01:17:51,380
Whereas if you do it in the
opposite order, it starts

1526
01:17:51,380 --> 01:17:52,580
packing them.

1527
01:17:52,580 --> 01:17:55,220
So generally it's best to
declare longer fields before

1528
01:17:55,220 --> 01:17:57,915
shorter fields because then you
know when you're finished

1529
01:17:57,915 --> 01:17:59,410
with the longer fields,
you're already

1530
01:17:59,410 --> 01:18:00,880
aligned for shorter fields.

1531
01:18:00,880 --> 01:18:03,660

1532
01:18:03,660 --> 01:18:06,430
Like arrays, there are static,
dynamic, and local structs.

1533
01:18:06,430 --> 01:18:07,340
So that's all they are.

1534
01:18:07,340 --> 01:18:08,965
And so you'll see in
the indexing--

1535
01:18:08,965 --> 01:18:11,590

1536
01:18:11,590 --> 01:18:13,600
There's also stuff that--

1537
01:18:13,600 --> 01:18:18,760
and actually this is important
for one of the binary puzzles

1538
01:18:18,760 --> 01:18:21,960
we gave to figure out
what it does--

1539
01:18:21,960 --> 01:18:23,950
there are what are called
SIMD instruction.

1540
01:18:23,950 --> 01:18:27,220
This is single instruction
multiple data instructions

1541
01:18:27,220 --> 01:18:29,740
where a single instruction
operates on

1542
01:18:29,740 --> 01:18:32,980
multiple pieces of data.

1543
01:18:32,980 --> 01:18:34,550
They operate on smaller
vectors.

1544
01:18:34,550 --> 01:18:40,570
And there are 16 128-bit XMM
registers, which you can view

1545
01:18:40,570 --> 01:18:44,160
as two 64-bit values or
four 32-bit values.

1546
01:18:44,160 --> 01:18:45,860
And you can do an
operation on it.

1547
01:18:45,860 --> 01:18:48,520
So this is used for multimedia,
for streaming

1548
01:18:48,520 --> 01:18:51,180
applications, and so forth where
you're trying to shove a

1549
01:18:51,180 --> 01:18:54,330
lot of data through and you're
doing the same repeated stuff

1550
01:18:54,330 --> 01:18:56,270
on the things at time.

1551
01:18:56,270 --> 01:18:59,160
So there are instructions that
operate on multiple values.

1552
01:18:59,160 --> 01:19:04,750
For example, here we're moving
four 32-bit ints into this

1553
01:19:04,750 --> 01:19:07,370
particular XMM register.

1554
01:19:07,370 --> 01:19:10,650
And similarly here's another
one, we're adding it.

1555
01:19:10,650 --> 01:19:12,750
And you can look at the
manual for these.

1556
01:19:12,750 --> 01:19:18,740
So you may come across these
because they're using up parts

1557
01:19:18,740 --> 01:19:19,450
of the machine.

1558
01:19:19,450 --> 01:19:22,990
Mostly those are the kinds of
things we can say look it up

1559
01:19:22,990 --> 01:19:24,790
in the manual, because nobody's
going to remember all

1560
01:19:24,790 --> 01:19:25,730
those instructions.

1561
01:19:25,730 --> 01:19:32,130
Of course, if you get a job
with one of the graphics

1562
01:19:32,130 --> 01:19:35,760
companies, then you may become
very familiar with these kinds

1563
01:19:35,760 --> 01:19:38,080
of instructions.

1564
01:19:38,080 --> 01:19:41,800
There's a lot more C and C++
constructs that we don't have

1565
01:19:41,800 --> 01:19:42,750
to go into.

1566
01:19:42,750 --> 01:19:46,450
You can have arrays of structs
versus structs of arrays.

1567
01:19:46,450 --> 01:19:48,125
And there can be a difference
in performance.

1568
01:19:48,125 --> 01:19:50,860

1569
01:19:50,860 --> 01:19:56,710
If you have an array of structs,
then it makes it, if

1570
01:19:56,710 --> 01:19:59,530
you're accessing one struct,
you can access the other

1571
01:19:59,530 --> 01:20:01,880
structs very easily.

1572
01:20:01,880 --> 01:20:06,590
But if you're using things like
the SSC instructions,

1573
01:20:06,590 --> 01:20:09,970
then it may be better to
have structs of arrays.

1574
01:20:09,970 --> 01:20:12,140
Because then when you're access
an array, you can

1575
01:20:12,140 --> 01:20:17,160
stream what's called a stride
of one, a regular stride of

1576
01:20:17,160 --> 01:20:20,170
just one memory location after
the next, to do the

1577
01:20:20,170 --> 01:20:22,310
processing.

1578
01:20:22,310 --> 01:20:25,090
So the hardware doesn't work
as well if you skip by

1579
01:20:25,090 --> 01:20:29,140
seventeens to gather things
compared if you just get one

1580
01:20:29,140 --> 01:20:29,980
thing after the next.

1581
01:20:29,980 --> 01:20:32,590
Because there's prefetching
logic that tries to fetch

1582
01:20:32,590 --> 01:20:34,620
things from memory faster.

1583
01:20:34,620 --> 01:20:36,570
There are things like
function pointers.

1584
01:20:36,570 --> 01:20:40,380
So you can have store function
pointer into something and

1585
01:20:40,380 --> 01:20:44,720
then call that function
indirectly.

1586
01:20:44,720 --> 01:20:47,310
There's things like bit
fields in arrays.

1587
01:20:47,310 --> 01:20:49,460
There are objects, virtual
function tables.

1588
01:20:49,460 --> 01:20:52,080
We'll get into some of
these when we do C++.

1589
01:20:52,080 --> 01:20:54,170
And there's a variety of stuff
having to do with memory

1590
01:20:54,170 --> 01:20:56,250
management that we'll
talk about.

1591
01:20:56,250 --> 01:20:59,890
But this is mainly to get you
folks sort of at the level

1592
01:20:59,890 --> 01:21:01,810
where you can sort of
understand and feel

1593
01:21:01,810 --> 01:21:05,170
comfortable with dealing
with the assembler.

1594
01:21:05,170 --> 01:21:07,140
And you'll see that
those resources

1595
01:21:07,140 --> 01:21:08,440
are pretty good resources.

1596
01:21:08,440 --> 01:21:12,440
But the basics are relatively
simple, but it's hard to do it

1597
01:21:12,440 --> 01:21:18,110
without a manual or some online
reference material.

1598
01:21:18,110 --> 01:21:19,360
Any questions?

1599
01:21:19,360 --> 01:21:22,500

1600
01:21:22,500 --> 01:21:26,770
What are the first two lessons
I taught you today?

1601
01:21:26,770 --> 01:21:28,826
Number one is--

1602
01:21:28,826 --> 01:21:30,260
AUDIENCE: [INAUDIBLE]

1603
01:21:30,260 --> 01:21:32,880
CHARLES LEISERSON: --write tests
before you write code.

1604
01:21:32,880 --> 01:21:34,990
And what's the second lesson?

1605
01:21:34,990 --> 01:21:35,750
AUDIENCE: Pair programming.

1606
01:21:35,750 --> 01:21:37,360
CHARLES LEISERSON: Pair
programming, not divide and

1607
01:21:37,360 --> 01:21:41,060
conquer, I teach algorithms
where divide and conquer is a

1608
01:21:41,060 --> 01:21:43,040
fabulous technique.

1609
01:21:43,040 --> 01:21:45,650
With programming, pair
programming is going to have

1610
01:21:45,650 --> 01:21:49,710
you generally get where you
want to get faster than if

1611
01:21:49,710 --> 01:21:51,180
you're programming alone.

1612
01:21:51,180 --> 01:21:54,915

1613
01:21:54,915 --> 01:21:56,750
OK, thank you.

1614
01:21:56,750 --> 01:21:58,720